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Abstract. We introduce a correspondence principle (analogous to the Fursten- 
berg correspondence principle) that allows one to extract an infinite random 
graph or hypergraph from a sequence of increasingly large deterministic graphs 
or hypergraphs. As an application we present a new (infinitary) proof of the 
hypergraph removal lemma of Nagle-Schacht-Rodl-Skokan and Gowers, which 
does not require the hypergraph regularity lemma and requires significantly 
less computation. This in turn gives new proofs of several corollaries of the 
hypergraph removal lemma, such as Szemeredi's theorem on arithmetic pro- 
gressions. 



1. Introduction 



It is an interesting phenomenon in mathematics that certain types of problems can 
be treated both by finitary means (e.g. using combinatorial analysis of finite sets), 
and by infinitary means (e.g. using constructions involving the axiom of choice), 
thus giving parallel but distinct ways to prove a single result. One particularly 
striking example of this is Szemeredi's theorem (see Theorem 2.1) on arithmetic 
progressions. This difficult and important theorem now has several proofs, both 
finitary and infinitary, using fields of mathematics as diverse as Fourier analysis, 
ergodic theory, graph theory, hypergraph theory, and elementary combinatorics; 
the finitary and infinitary arguments are connected by the beautiful Furstenherg 
correspondence principle (see Section 2). These proofs have different strengths and 
weaknesses; generally speaking, the infinitary proofs are cleaner, shorter, and more 
elegant, but require significantly more machinery, whereas the finitary proofs are 
more elementary and provide more quantitative results, but tend to be messier and 
longer in nature. One particularly visible difference is that finitary proofs often 
require a number of small parameters (such as e, 5) or large parameters (such as 
N,M), whereas in the infinitary analogues of these proofs, the small parameters 
often have become zero and the large parameters have become infinite, which can 
lead to cleaner (but more subtle) arguments. 

Some progress has been made in reconciling the finitary and infinitary approaches^, 
as it has been increasingly realized that ideas and methods from the infinitary world 

^From a proof-theoretical perspective, one can use quantifier-elimination methods (such as 
Herbrand's theorem) to automatically convert a large class of infinitary arguments to finitary ones; 
this was for instance carried out for the Furstenberg- Weiss infinitary proof of van der Waerden's 
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can be transferred to the finitary world, and vice versa; see for instance [29] for a 
finitary version of the infinitary ergodic approach to Szemeredi's theorem. Such 
a fusion of ideas from both sources proved to be particularly crucial in the recent 
result [13] that the primes contained arbitrarily long progressions; this argument 
was almost entirely finitary in nature, yet at the same time it relied heavily on ideas 
from the infinitary world of ergodic theory (see [17], [15] for further discussion of 
this connection). 

In this paper we investigate a transference in the other direction, taking results from 
the finitary world of combinatorics (and in particular graph theory and hypergraph 
theory), and identifying them with a corresponding result in the infinitary world, 
which in this case turns out to be the world of probability theory^ (or measure 
theory). In particular, wc present a correspondence principle, analogous to the 
Furstenberg correspondence principle, that shows how any sequence of increasingly 
large graphs or hypergraphs has a "weak limit" , which we view as an infinitely 
large random graph or hypergraph^. This principle is slightly more complicated 
than the Furstenberg correspondence principle, but does not use the full power of 
deep results such as the Szemeredi regularity lemma or its extension to hypergraphs; 
indeed we do not explicitly state or use such a regularity lemma in this work here, 
although ideas from that lemma are certainly involved in several components of the 
argument. 

The main advantage of passing from a deterministic finite graph to a random infinite 
graph is that one now obtains a number of factors (cr-algebras) in the probability 
space which enjoy some very useful invariance and relative independence properties. 
One can think of the presence of these factors as being analogous to the partitions 
obtained by the Szemeredi regularity lemma that make a graph e-regular, but with 
the distinction that the partition is now infinite and the e parameter set to zero 
(so one now has perfect regularity) . This sending of the epsilon parameters to zero 
turns out to be extremely useful in cleaning up proofs of certain statements which 
previously could only be proven via a regularity lemma. In particular, we will give 
an infinitary proof here of the triangle removal lemma of Ruzsa and Szemeredi 
[24] , as well as the substantially more difficult hypergraph removal lemma of Nagle, 
Rodl, Schacht, and Skokan [19], [20], [22], [23] and Gowers [12] (as well as a later 
refinement in [30]). As this lemma is already strong enough to deduce Szemeredi's 
theorem on arithmetic progressions (as well as a multidimensional generalisation 
due to Furstenberg and Katznelson [9]), we have thus presented yet another proof 
of Szemeredi's theorem here. These lemmas have some further applications; for 



theorem via topological dynamics, sec [11]. However such methods do not seem to shed much light 
on the connection between the infinitary proofs and the existing finitary proofs in the literature. 

■^This is actually not all that surprising, given that finitary probability theory has already 
proven to have a major role to play in graph theory. 

■^Tliis is related to, but slightly different from, a different concept of graph limit developed by 
Lovasz and Szegedy in [18], in which the limiting object becomes a "continuous weighted graph", 
or more precisely a symmetric measurable function from [0,1] X [0,1] to [0,1], Such a concrete 
limiting object is particularly useful for computations such as counting the number of induced 
subgraphs of a certain shape; it also can be used to establish results such as the triangle removal 
lemma (Szegedy, personal communication). 
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instance, they were used in [31] to show that the Gaussian primes contained arbi- 
trarily shaped constellations. In Appendix B we discuss the connections (or lack 
thereof) between these infinitary removal lemmas, and the recurrence theorems of 
Purstenberg and later authors. 

The setting of this paper was deliberately placed at a midpoint between graph 
theory and ergodic theory, and the author hopes that it illuminates the analogies 
and interconnections between these two subjects. 

The author thanks Balasz Szcgcdy for many useful discussions, Timothy Gowers 
for suggesting the original topic of investigation, Vitaly Bergelson for encourage- 
ment, and Olivier Gerard for corrections. The author is especially indebted to the 
anonymous referees for many corrections and suggestions. The author is supported 
by a grant from the Packard Foundation. 



2. Motivation: the Furstenberg correspondence principle 

To motivate the correspondence principle for graphs and hypergraphs, we first re- 
view the Furstenberg correspondence principle which connects results such as Sze- 
meredi's theorem with recurrence results in ergodic theory. Let us recall Szemeredi's 
theorem in a quantitative (finitary) form: 

Theorem 2.1 (Szemeredi's theorem, quantitative version). [26] Let < 5 <1 and 

k > 1. Let A be a subset of a cyclic group Z^r := Z/A^Z whose cardinality \A\ is 
at least 6N. Then there exist at least c{k,6)N^ pairs {x,r) G Zjv x Zjv such that 
x,x + r, . . . ,x + {k — l)r e A, where c{k, 6) > is a positive quantity depending 
only on k and d. 

This result is easily seen to imply to Szemeredi's theorem in its traditional (in- 
finitary) form, which asserts that every set of integers of positive upper density 
contains arbitrarily long progressions. The converse implication also follows from 
an argument of Varnavides [33]. This particular formulation of Szemeredi's theo- 
rem played an important role in the recent result [13] that the primes contained 
arbitrarily long arithmetic progressions. 

In 1977, Furstenberg obtained a new proof of Szemeredi's theorem by deducing it 
from the following result in ergodic theory. 

Theorem 2.2 (Furstenberg recurrence theorem). [7], [10] Let (O, SmaxjP) be a 

probability space (see Appendix A for probabilistic notation). Let T : Q ^ Q be 
a bi-measurable map which is probability preserving, thus P(T"A) = P(A) for all 
events A e Bmax and n € Ti. Then for all k > 1 and all events A e Bmax with 
P{A) > 0, we have 

1 ^ 

liminf — V P(A A T"A A ... A t'-'^'^^^'A) > 0. 
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The deduction of Theorem 2.1 from Theorem 2.2 proceeds by the Furstenherg cor- 
respondence principle [7], [10], [8]. Let us give a shghtly non-standard exposition of 
this principle (in particular drawing heavily on the language of probability theory), 
in order to motivate an analogous principle for graphs and hypergraphs in later 
sections. We shall interpret this correspondence principle as an assertion that any 
sequence (A^™) , Zjv(m) ) of sets A^™-'^ in a cyclic group Tjpjt-m) can have an asymp- 
totic limit as TO 00, which will end up being a probability space endowed with 
a probability-preserving shift T. To state this more precisely we shall need some 
notation. First, we describe a certain universal space in which it will be convenient 
to take limits. 

Definition 2.3 (Furstenberg universal space). Let Vt := 2^ := {B : B C Z} denote 
the set of all subsets i? C Z of the integers Z; one can also view this space as the 
infinite cube {0, l}'^ if desired. We give this space the product u-algebra Bmax, 
generated by the events^ A„ := {S G : n € B} for n G Z. Indeed, one can 
think of (f2, Bmax) as being the universal event space generated by the countable 
sequence of events An- The space enjoys an obvious shift action T : O ^ fi, 
defined hj TB := B + I ~ {n + I : n & B} for all B G O. This then induces a 
shift T : Bmax ^ Smax in the obvious manner, thus for instance T"^o = ^n- We 
define the regular algebra Breg of Bmax to be the algebra generated by the A„, thus 
the events in Breg (which we refer to as regular events) are those events which are 
generated by at most finitely many of the A„ (i.e. those events that only require 
knowing the truth value of n € B for finitely many values of n). 

Now we embed finite objects (A^"^^ , Zjv(™) ) described earlier in this universal space. 

Definition 2.4 (Furstenberg universal embedding). Let to > 1, let Z^(m) be a 
cyclic group with N^"^^ > to, and let A^"^^ be a subset of Zj^(m). We define the 
probability space (ri''"^ Bmax, P^™') as the space corresponding to sampling^ x^"^^ 
and a'™-* uniformly and independently at random from Z^(m) and [I/*-™-'], where 
[N] := {1, . . . ,N} denotes the integers from 1 to iV and L^"*) > 1 is the integer 
part of 7V(™) /to. We then map every pair (.r^"'), A^™)) of O^"*) to a point B^™) G f2 
(i.e. a subset of the integers Z) by the formula 

S^™) := {n e Z : x^") + nA^"' G A^"'}; 

one can think of this as a random lifting of the set A^"^^ C Zjy{™) up to the 
integers Z. This mapping from O^'") to fl is clearly measurable, since the inverse 
images of the generating events An in (f7,Smax) arc simply the events that a;^™) -|- 
are certainly measurable in BmaL- This allows us to extend 
the probability measure P^™) from (;B^x,f^^™^) to the product space (Smax x 

more topological of thinking about this proceeds by endowing Q with the product topology, 
so that it becomes a compact HausdorfT totally disconnected space, and then letting Smax be the 
Borel <T-algebra, generated by the open sets. The regular algebra Breg then consists those events 
which are simultaneously open and closed, or equivalently those events whose indicator function 
is continuous. 

^The introduction of the dilation parameter A^"*' is essentially the averaging trick of Varnavides 

[33]. The exact construction of this space is not important so long as one has the independent 
random variables a;'™^ and A'™^ , but for sake of concreteness one can set f2('") := Zjy(„) x [L'™)], 

:= 2"''"-' to be the power set of n('"), and P^"") to be the uniform distribution on H^'"). 
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Bi^x, X fif™)) in a canonical manner^, identifying the events An with the events 
j.{m) _|_ ^\{m) g ^(m). 'We shall abuse notation and refer to the extended measure 
also as 

In more informal terms, the Furstenberg embedding has created, for each m, a 
random set B^"^^ C Z which will capture all the important information about the 
original set ^(™) and Zj^(m) . For instance, the density of ^(™) is nothing more than 
the probability that lies in B^"^\ or equivalently the probability of the event Aq. 
One can view a;'"*' and A'"*' as the "hidden variables" which generate this random 
set B^™). However, in order to invoke the correspondence principle we will need to 
"forget" that the random set actually came from these variables; indeed, we 

are going to restrict P^'") to the common factor {fl, Bmax) in order to take limits 
as m — > oo. More precisely, we have 

Proposition 2.5 (Furstenberg correspondence principle). For every m > 1, let 
Z^{m) be a cyclic group with A^^™) > m, and let A'™' be a subset of Zjv(m) , and let 
P^™^ be as in Definition 2.4- Then there exists a subsequence < mi < m2 < . . . 

ofm, and a probability measure p(°°) on the Furstenberg universal space (fi,Bmax); 
such that we have the weak convergence property 

lim P("'')(£;) = P(°°)(£:) for all E G Sreg. (1) 

i — *oo 

Furthermore, we have the shift invariance property 

p(oo) ^j.n^^ ^ p(oo) j^^ ^ g g^^^^ ^ g 2. (2) 

Proof The algebra Breg is countable. Thus the existence of the weak limit p(™<) 
follows from Lemma A. 15. Now observe that the random sets B^"*) and = 
+n have the same probability distribution (because Xo and Xq — nX have the 
same distribution for any fixed A). Thus we observe that p(™) is shift-invariant: 

p(m)(j.n^) = p('")(£;) for all E e Smax,n e Z. 

Applying (1) we obtain (2) for all regular events E. But since p(°°) is countably 
additive, we see that the space of events E for which (2) holds for every T is a 
cr-algebra which contains Breg, and thus contains Smax as claimed. ■ 

Now we can deduce Theorem 2.1 from Theorem 2.2. 

Proof [of Theorem 2.1 assuming Theorem 2.2] Suppose that Theorem 2.1 fails. 
Then we can find k >1 and < 5 < 1, a sequence N^"^^ of positive integers, and a 
sequence of sets A^™) c Z^(,n) of density |^(™)|/Ar('") > S such that 

1™ 71^F(Lwl{(a;,'') eZ^(„) ■.x,x + r,... ,a;+(/c-l)re A(™)}| = 0. 



More precisely, we graph the measurable mapping from to n as a measurable mapping 

from fi'*"' to O X fi'™', wliieli contravaiiautly induces a cr-algebra liomomorphism from the 
product (T-algebra Bmax x B^i^x to Bmax- Pulling back the probability measure p(™) under this 
homomorphism yields the extension. A similar construction applies to the graph and hypergraph 
embeddings in later sections. 
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By passing to a subsequence of m if desired we can make this convergence arbitrarily 
fast; for instance, we can ensure that 

r) e Z;v(-) : x,x + r, . . . ,x + [k - l)r G A^™)}! < 5100"™. 

Observe that the left-hand side is at least ^jy(L))2 > S/N^'^\ so we conclude 
that 

In particular N^"^^ > m, so we can invoke the Furstenberg correspondence principle 
and obtain a shift-invariant system Smax, P^°°^) on the Furstenberg universal 
space (r2, Smax) with the stated properties. 

Now lot us compute some probabilities in this system, starting with the probability 
of Aq. From definition of p('") wc have 

so by (1) we have 

P(^o) > S. 

In particular Aq has strictly positive probability. Next, let 1 < n < m and consider 

the expression 

P('")(^0 A T"Ao A ... A T'-'^-^^^Ao) = p(™) A A„ A . . . A A(,_i)„) 

= p('»)(x(™) + jnA^™) e A^™) VO < i < k) 



X 



(m) 



+ jnA^") e VO < i < fc}|. 



Now observe from definition of L^™^ , the progressions a;^™^ , x*^™) -t-nA^"*^ i • • • , a;*^"*^ + 
(fc — l)nA('"^ are all distinct as x^™^ and A^™^ vary. Applying (3), we see that 

p('")(Ao A r"Ao A ... A T('=-i)"Ao) = p('")(Ao A A„ A ... A A^k-i)n) 

~ Ar(™) Z/(™) ^ ' 

< 2ml00-™ 
(say) for all 1 < n < m. In particular we have 

lim p(™)(Ao A T"ylo A ... A T('=-i)"Ao) = 

m— *oo 

for each fixed n > 1, and hence by (1) 

P(°°) (Ao A T"ylo A ... A Tf'^-i^^Ao) = 

for all n > 1. But this contradicts Theorem 2.2. This completes the deduction of 
Theorem 2.1 from Theorem 2.2. ■ 

Remark 2.6. Note that as this proof proceeded by contradiction, it does not obvi- 
ously give any sort of quantitative lower bound for the quantity c(fc, 5) appearing 
in Theorem 2.1. It is actually possible (with nontrivial effort) to extract such a 
bound by taking the proof of Theorem 2.2 and making everything finitary; see [29]. 
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However the bounds obtained in this manner are extremely poor. The same re- 
marks apply to the infinitary proofs of the triangle removal lemma and hypergraph 
removal lemma that we give below. As a related remark, observe that the above 
argument, while infinitary, did not require the axiom of choice, as one can eliminate 
the apparent use of choice at the beginning of the argument by well-ordering the 
objects A, Zjv, S appearing in Theorem 2.1 in some standard manner. (The use of 
Lemma A. 15 also does not require choice; see Remark A. 16. The original proof of 
the Furstenberg recurrence theorem in [7] is also choice-free, though the later proof 
in [10] is not, as it uses Zorn's lemma.) Indeed we will not actually need the axiom 
of choice in this entire paper, though we shall assume it in order to simplify the 
exposition slightly. 

Remark 2.7. One can also reverse the above argument, and use Theorem 2.1 to de- 
duce Theorem 2.2, basically by applying Theorem 2.1 to various truncated versions 
of the random set B := {n € Z : T^x € A}, where x is sampled from the sample 
space Q, using the probability measure P. We omit the standard details. 

3. The graph correspondence principle 

We now develop an analogue of the Furstenberg correspondence principle for graphs; 
namely, we start with a sequence of (undirected) graphs G^™) = (T/(™), iJ^™)) for 
each m > 1, and wish to extract (after passing to a subsequence of m's) some sort 
of infinitary weak limit. This type of problem was already addressed in [18], with 
the main tool being a certain weak form of the Szemcrcdi regularity lemma. Our 
approach is somewhat similar (though not identical), and the regularity lemma will 
appear only after the infinite limit is extracted, in Lemma 3.5 below. 

As before, we need a imiversal space in which to take limits. Just as the Furstenberg 
universal space consisted of infinite sets of integers, the graph universal space will 
consist of infinite graphs on the natural numbers. The shift T (which represents a 
Z-action) is now replaced^ by the action of the permutation group , defined as 
the group of all permutations cr : Z ^ Z of the integers. 

Definition 3.1 (Graph universal space). Let N := {1,2,...} denote the natural 

numbers, and let f2 := 2(") = {(N,£;„o) : C (^)} denote the space of all 
(infinite) graphs (N, i?oo) on the natural numbers, thus the edge set E^a is an 
arbitrary collection of unordered pairs of distinct integers. On this space f2, we 
introduce the events Ai^j = Aj^i for any unordered pair of distinct natural numbers 
{i,j} G ('2') by Aij := {(N, £^00) £ ■ (ij) & Eoc}, and let iS^ax be the cr-algebra 
generated by the countable sequence of events Aij. (We adopt the convention that 
Ai^i = for all i G N, thus our graphs have no loops.) We also introduce the regular 
algebra B^cg generated by the Ai_j, thus these are the events that depend only on 
finitely many of the Ai_j. For any permutation a : N ^ N of the natural numbers, 
we define the associated action on Smax by mapping a : Aij 1— > ^(^(j) (^(^j and 
extending this to a cr-algebra isomorphism in the unique manner. More explicitly, a 
will map each graph {N, E^o) to the graph [N, aEoo), where aE^o := cr{j)} ■ 

are indebted to Balasz Szegedy for pointing out the analogy between the Z-action of a 
dynamical system and the 5oo-action on an infinite graph. 
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{i,j} G Eqo}. For any (possibly infinite) subset / of N, we define Bi to be the factor 
of yBmax generated by the events Aij for i,j e /; informally speaking, Bj represents 
the knowledge obtained by measuring the restriction of -Eqo to /. Observe the trivial 
monotonicity Bj C Bj whenever I C J. 

The space (17, 6max) thus the universal event space associated to the events 
Aij, and is the natural event space for studying infinite random graphs. (For 
instance, the infinite Erdos-Renyi random graph G{oo,p) for fixed < p < 1, 
where the vertex set is Z (say) and any two integers are connected by an edge with 
an independent probability of p, would correspond to the scenario in which all the 
events Aij are independent with probability p each.) The permutation group 
defined earlier acts on the event space (fi, Smax) in the obvious manner. Thus for 
instance a{Bi) = B„(^j) for all and / C N. 

Next, we need a way to embed every finite graph into the universal space;. 
Definition 3.2 (Graph universal embedding). Let to > 1, and let G^"' = (F(™),£;(™) 
be a finite graph. Let (n^'"^ ;B^x, p(™)) be the probability space corresponding to 
the sampling of a countable sequence® of i.i.d. random variables x^™\a;2™\ . . . G 
y sampled independently and uniformly at random^. To every sequence {x^^^ , 
-^e associate an infinite graph G^oo^ = (N,E^^) G O by setting 

one can think of this as a random lifting of the graph G^™-* on \/(™) up to an infinite 
graph G^' on the natural numbers N. This mapping from fi^™' to O is clearly 
measurable, since the inverse images of the generating events Ai j in {CI, B^ax) are 
simply the events that {.Tj-™-', .t^™-*} lie in G^™-*, which are certainly measurable in 
B^x- This allows us to extend the probability measure p(™) from {B^l, n^™') to 
the product space (Bmax x S^Sx, fl x O^™^) in a canonical manner, identifying the 
events Aij with the events {a; •'"^ , a;^™^ } G E^'^^ . We shall abuse notation and refer 
to the extended measure also as p('"). 

Remarks 3.3. Now that the space (ri*^™', ZJmax) is infinite, not every event involving 
the a;|™'' is measurable, however any event which involves only finitely many of 
the a;-™^ is certainly measurable (and in particular has a well-defined probability). 
One can view E^^ as the infinite random graph formed by statistically sampling 

^This sequence contains the "hidden variables" that will play the role of the parameters x^"*' 
and A^"*) in the preceding section. Again, the exact construction of this Wiener-type probability 
space is not important. The most canonical way to proceed is to let n^""' be the countable product 
(V('"))N with the product a-algcbra and the product uniform probability measure P^™'. A 

more concrete way would be to identify I/'™'-' with [n'""^] = {1, . . . , n'*")} by appropriate labeling, 
set n^"*) to be the unit interval [0, 1) := {x : < x < 1}, let sliSl be the Borel cr-algebra, P^"") 
be Lebesgue measure, and let Xj be the j*'' digit in the base-n^"'' expansion of x (rounding down 
when a terminating decimal occurs) . 

90f course for any fixed m there will be infinitely many repetitions among these x''"' since 
yf"') is finite, but in practice we are interested in taking limits in which |y(™-)| — > 00, and so 
these collisions will become asymptotically negligible. 
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of the original finite (and deterministic) graph E^"''K This is a convenient way to 
convert arbitrary graphs, on arbitrary vertex sets, to a fixed universal (random) 
graph on a fixed universal vertex set, in this case the natural numbers N. The 
random graph iJ^"'' turns out to capture all the relevant features we require of 
the original graph; for instance, the probability that lies in the event 2 

is essentially"'^'^ the edge density of E^"^\ while the probability that E^^^ lies in 
^1,2 A ^2,3 A ^3,1 is essentially the triangle density of E^"^\ and so forth. On the 
other hand, it suppresses irrelevant features such as what the labels of the original 
vertex set were; in particular, applying a graph isomorphism to E^"^^ does 

not affect the probability distribution of -E^^ at all. More generally, we observe 
the permutation invariance 

P^'") {aE) = P^™) {E) for all E G B^a., u & (4) 

which can be verified by first checking on regular events E (i.e. finite boolean 
combinations of the Ai j) and then extending as in the proof of the Furstenberg 
correspondence principle. 

Once again, we can view the random graph as being generated^^ by "hidden 
variables" x^^\ . . . . As before, we wish to "forget" these hidden variables 

and pass to a limit. This can be achieved as follows. 

Proposition 3.4 (Graph correspondence principle). For every m > 1, let G(™) = 
(yC™)^ 5e a finite undirected graph, and let p("*) be as in Definition 3.2. 

Then there exists a subsequence < mi < 7712 < ... of m, and a probability 
measure p(°°) on the graph universal space (^^,Smax)j such that we have the weak 
convergence property (1) and the permutation invariance property 

P^^\aE) = P'-^Xe) for all E G Bmax,^ G 5oo. (5) 

Proof The algebra Brcg is countable. Thus the existence of the weak limit pC"**) 
follows from Lemma A. 15. From (4) we can deduce (5) by arguing exactly as in 
the Furstenberg correspondence principle. ■ 

So far, the permutation group Soo has played the same role for graphs as the integer 
group Z played for sets of integers. However, the permutation group is significantly 
more "mixing", which allows us to immediately "regularise" the system obtained 
in Proposition 3.4: 

Lemma 3.5 (Infinitary regularity lemma). Let F be a probability measure on the 
graph universal space (fi,Sniax) which is permutation-invariant in the sense of (5). 
Then for any 7, 7i , . . . ,Ii c N with 7 n /i D . . . fl 7; infinite, the factors Bj and 

Bj- are relatively independent conditioning on Vi=i Bjnii, with respect to this 
probability measure P. (See Appendix A for a definition of relative independence.) 

-'^^We say "essentially" because there is a slight error term eoming from the event that x^^^ = 
Xj™'. However this error will become negligible in limits for which lyt"")! — > 00. 

"'^"^This is of course the perspective taken in property testing. It is not surprising that the 
Szemeredi regularity lemma plays a crucial role in that theory also; see [2]. Indeed, this argument 
suggests that an infinitary approach to property testing theory is possible. 
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This result is the infinitary analogue of the Szemeredi regularity lemma, and will 
play a crucial role in establishing the proof of the triangle removal lemma (and 
later, the hypergraph removal lemma) in subsequent sections. 

Proof Fix I,Ii,... ,Ii. We may assume I > 1 since the claim is trivial when 

1 = 0. To show that Bi and VLi '^h 

arc relatively independent conditioning on 
Binii with respect to the probability measure p(^), it suffices by Lemma A. 26 
to show that 

I I 
mEi\\J BiMl^ = \\P{Ei\\/ Bini.)\\L^ 

i=\ 1=1 

for all Ej & Bj. By Lemma A. 18 and limiting arguments we may assume without 
loss of generality that Ej is regular. In particular we have Ei G Bi' for some finite 
subset /' of /. By Corollary A. 20 and a limiting argument we may assume that the 
set I has an infinite complement. By another such limiting argument we can also 
assume that Ii\I is finite for all i. 

Let A be the infinite set A := (/ n /i fl . . . fl /;)\/', and let B be the finite set 
B := ULi ^A^- Then we can find a permutation cr be a permutation which maps 
A to AU B bijectively, but is constant on I\A, and in particular fixes /'. Thus a 
also fixes Ej, and maps / fl /j to (/ fl /,) U B. Thus 

I I 
\\P{Ei\ V %n/.)uB)||L^ = ||P(^/| V 6/n/.)||L- 

1=1 i=l 

But as S/. lies between Bmii and S(/n/j)uB) the claim now follows from Lemma 
A.12. ' ■ 



Remark 3.6. The above proof of the regularity lemma is short but perhaps a bit 
opaque. Let us informally discuss a special case of this lemma, namely that the 
events Ai^3 and ^2,3 are relatively independent conditioning on ^{3 4 5 this is 
a special case of the situation where / = N\{2}, Ii = N\{1}, and / = 1. Passing 
back to the finite graph setting (by working with the probability measures p(™) 
from Proposition 3.4), this claim may seem puzzling at first, because the events 
{X]^"''' , Xg™' } G and {a;2™^ ^3™'} ^ i?*^™^ can certainly be correlated; indeed, 

whenever has high degree, then both events occur with high probability, and 
when it has low degree, both events occur with low probability. However, if one 
can somehow learn the degree of then these two events become relatively 

independent conditioning on the degree of xS"^^ . And now the purpose of the factor 
'B{3,4,5,... } becomes clear; by "polling" many additional vertices x'f^^ , x'^^ , x^^^ 
and measuring the connectivity of x^^ with all of these additional vertices, we can 
obtain a statistical prediction for the degree of x^^ , whose accuracy and confidence 
level become almost surely perfect in the asymptotic limit N ^ 00. More generally, 
it turns out that by polling the interconnectivity of vertices in the infinite set / fl /j 
for i = 1, . . . ,1 one can obtain an almost surely perfectly accurate prediction of all 
the "common information" held between an event in Bj and an event in Vi=i ^U- 
Let us illustrate this with one further example, namely the relative independence 
of Ai^2 and vl2,3 A ^1,3 conditioning on S{i_4^5^...} V B^2,i,5,. ..}'■> this corresponds 
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to the case / = N\{3}, I = 2, and li = N\{i} for i = 1,2. We are asking for 

the events {a;<™\4"^} e and {a;J"\ 4""^}, {4™\ 4™^} e E^^^ to become 

relatively independent once we sample all the connectivity information between 

X2"^ and x'"^^ , x'^^ ,.. ., and between Xg and x'^^ , x'^^ , To see how this will 

work, observe that while the two events in question will not be unconditionally in- 
dependent in general, they will become conditionally independent once the number 
of paths of length two connecting a;^™-* and Xg™"* are known, since upon freezing 
xj™' and X2'"'' this determines the probability that the independent variable Xg™'* 
will satisfy the latter event {xj™\ Xg'"^}, {x2™\ Xg™^} e E^™); since Xg™' does not 
affect the former event {x^'"\ Xg™"*} £ E^"^\ we obtain relative independence. But 
the number of paths of length two can be determined statistically, by counting 
the proportion of j G {4,5,...} for which {x^^^xj"*^} and {x2'"\x^'"^} both lie 
in E^'^h This lies in the factor S{i_4^5_. .} V B{2,4,5,...} and is the reason for the 
conditional independence^^. 

Remark 3.7. Similar correspondence principles exist for bipartite graphs, directed 
graphs, and multicolored graphs (where the color set is kept independent of m), 
and so forth; for instance, a generalisation to tripartite graphs is sketched out in 
Appendix B. We will not pursue the other generalisations here as they arc rather 
minor, though we will consider a hypergraph extension of this principle in Section 
7. 

4. An infinitary proof of the triangle removal lemma 

Let us now apply the above correspondence principle to obtain the following triangle- 
removal lemma of Ruzsa and Szemeredi: 

Lemma 4.1 (Triangle removal lemma). [24] Let G = (V, E) he an undirected graph 
with \V\ = n vertices. Suppose that G contains fewer than 5n^ triangles for some 
Q < 5 <1, or more precisely 

|{(xi,X2,X3) e V'^ : {xi,X2}, {x2,X3}, {x3,xi} e E}\ < Sn^. 

Then it is possible to delete os^o{n^) edges from G to create a graph G' which is 
triangle-free. Here os^oiji^) denotes a quantity, which when divided by n^, goes to 
zero as 5 — > 0, uniformly in n. 

Previous to this paper, the only known proof of lemma proceeded via the Szemeredi 
regularity lemma [26] . It can be used among other things to imply the /c = 3 case of 



There is another way of viewing this, namely that each vertex induces a partition of the 
and Xj"*' vertex sets, by dividing them into those vertices which are connected to x^J"^ in 
and those that are not. Letting j vary in {4, 5, . . . , N} one obtains a partition of these vertex 
classes which behaves increasingly like the partitions created by the Szemeredi regularity lemma as 
N — + oo, in the sense that the graph between the Xj""' and x\j^^ becomes increasingly "e-regular" 
relative to this partition; the e-regularity is closely related to the relative independence properties 
discussed here. We will however not pursue this approach as it becomes somewhat complicated 
when we move to the hypergraph setting, whereas the techniques we present here carries over to 
hypergraphs with virtually no changes. 
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Szcmcredi's theorem (Theorem 2.1). Based on this connection, it is natural to ask 
whether there is an infinitary analogue of this lemma, similarly to how Theorem 2.2 
is an infinitary counterpart to Theorem 2.1. We shall deduce it from the following 
substantially stronger infinitary statement. 

If J is a set, we define an downset i in J to be any collection of subsets e of J 
with the property that whenever e G i and e' C e, then e' e i also. In particular, 
downsets are automatically closed under intersection. 

Theorem 4.2 (Hypergraph removal lemma, infinitary version). Let (fi, ;Bniax, P) 
be a probability space, and let B^eg Q •Smax be an algebra. Let J be a finite set, and 
let imax be an downset in J. For each e € imax let Be be a factor of Smax with the 
following properties: 

• (Regularisability) Each of the factors B^ is generated by countably many 
events from Breg- 

• (Nesting) If e, e' £ imax o-tc such that e C e' , then Be is a factor of Be' ■ 

• (Independence) If e,ei, . . . ,ei G imax; then the factors Be and Vi=i are 
relatively independent conditioning on Vi=i ^ena ■ 

For each e e imax; let Ee be an event in Be such that 

P( /\ Ee) = Q. 

eGimax 

Then for any £ > 0, there exist events Eg € BgCi Byeg for all e e imax such that 

P{Ee\Fe) < e for all e e 

and 

eeimax 

We will prove this rather strange- looking proposition in Section 6. For the purposes 

of proving the triangle removal lemma, we will only need this lemma in the special 
case when J = {1,2,3}, when imax := {e : |e| < 2}, and when Ag = for 
all e 7^ {1, 2}, {2, 3}, {3, 1}. However, the lemma is not that much more difficult to 
prove in the general case"'^'^, and it will rather easily yield a hypergraph generalisation 
of the triangle removal lemma, so we retain the proposition in the general form. 
The hypothesis P(Aeei ^e) = is the analogue in Lemma 4.1 of the hypothesis 
that G has few triangles, while the conclusion Aeei i^e = is the analogue of the 
conclusion that the modified graph G' is triangle-free. 

l^This is in stark contrast to the finitary situation, in which the hypergraph removal lemma 
is significantly more difiicult than the triangle removal lemma. This is ultimately because of the 

need in the finitary hypergraph setting to constantly play off cpsilons of different sizes against 
each other; see [19], [20], [22], [23], [12], [30] for some examples of this. However in the infinitary 
asymptotic limit, most of the cpsilons have disappeared or at least been confined to individual 
lemmas where they do not interact with other cpsilons. This simplifies the proof significantly, 
albeit at the cost of working in an infinitary setting as opposed to a finitary one. In the converse 
direction, note the proliferation of epsilons in [29] vi^hen Purstenberg's proof of Szemeredi's theorem 
is transferred from the infinitary setting to the finitary one. 
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Proof [of Lemma 4.1 assuming Theorem 4.2] Suppose for contradiction that Lemma 
4.1 failed. Then we can find an < 77 < 1 and sequence n*-'"-' of integers, a sequence 
of graphs G(™) = (f('"),£;(™)) with = n'^"'\ such that the G^"*) have 

asymptotically vanishing number of triangles, 

lim —L^\{{xi,X2,Xs) e : {xuX2},{x2,xs},{xs,Xi} G = 

m—^oo [n'^"^')^ ^g-j 

but such that each of the G'-'"-' cannot be made triangie-free without deleting at 
least 77(71^'"^)^ edges. (One could make the decay rate in (6) more rapid, as in the 
proof of Theorem 2.1, but we will find no need to do so here.) In particular, G*^™^ 
contains at least one triangle, and hence the expression inside the limit of (6) is at 
least l/(n('"))^. This implies that 

n^™^ ^ 00 as m ^ 00. (7) 



Now let (O.Bmax) be the graph universal space introduced in Definition 3.1, with 
the attendant events An^mi regular algebra Sreg, factors Bj, and 5*00 group action. 
Let p(™) be the probability measure on (fi x fi^™) , ;Bmax x S^x) defined in Definition 
3.2, and let p(°°) be a limiting measure as constructed in the graph correspondence 
principle (Proposition 3.4). From (6) we have 

lim p('")(Ai,2 AA2,3A^3,i) = 

m— >oo 

and hence by (1) 

P(°°)(^l,2AA2,3A^3,l) = 0. 

We will apply Theorem 4.2 on the universal space (il, /iSmax, P'""-*) with J := 
{1,2,3}, imax := {e : |e| < 2}, Eg set equal to Aij if e = for some 

ij — 12,23,31, and Eg = otherwise, and with Be set equal to Beu{4,5,...} for 
all e G imax- The nesting and regularisability properties required for Theorem 4.2 
are obvious, while the independence properties follow from Lemma 3.5. We can 
thus invoke the theorem and find regular events Fe e Beu{4,5,... } nBreg for e G imax 
with 

P(~) iEe\Fe) < 77/100 for e G ima. (8) 

such that 

A ^e = 0- (9) 



It is convenient to eliminate the lower order components 0, {1}, {2}, {3} of the ideal 
imax- For ij = 12, 23, 31, define F/ ., := F^j^^j A A F^J} A F^. Then the F^j are 
regular, and (by monotonicity) we have F-j G H|j 5^... }. From (8) and the choice 
of the Ee we have 

I>^°°\Aij\Flj) < 7?/10 for ij = 12, 23, 31 (10) 
while from (9) we have 

^'i',2AF^,3AF^,i=0. (11) 
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Now we reinstate the "hidden variables" a;^"''' , Xj™' , . . . by viewing once again 
as a probability measure on the product space (Ox Q^"*^ , Smax x B^x); in particular 
Aij is now identified with the event that ^^j'"''} lies in the graph G^™'\ Now 

because Aij and ^ are regular, the quantity P(™)(j4i.j\F/j) is the probability of 
an event involving only finitely many of the random vertices xf^^ of V^"^^ ; let us say 
that it involves only the vertices x^^'' , • ■ • , x^^'^ (note that N will be independent 
of m, depending only on the complexity of the event F^^). By increasing N if 
necessary we may assume > 3. Recall that F/^ depends only on the vertices 

a;^ ,Xj and , . . . , a;]y . ii'or any nxed values of ^4 , . . . , % let us say 
that a vertex pair {x,y} C V^"^^ is good if for each ij = 12,23,31, the event F/^ 
holds true whenever x,y are substituted for either xl"^\x^^^ or x'"^\x\"^\ Now 
define the random subgraph (G")^"'' = (^^"^ (F')^"^) of G^™) = (y(™),F(")) by 
setting (£;')(") to be all the good pairs {x,y} in this graph depends on the 

random variables x''^\ . . . ,x^^\ From (10) we see that 

E|F(")\(£;')^"^| < r?(7V(™))2. 

Also, we observe that regardless of the values of X3 \ . . . , a;^ the graph (G")'"-* 
almost surely cannot contain any triangles, as this would contradict (9). But by 
the pigeonhole principle we can find a deterministic representative of the random 
graph (G')^'"^ for which 

|£;(™)\(£:')(™)| < ,y(7V(™))2^ 

and so we have made G^"*) triangle- free by removing fewer than r]{N^"^^y edges, a 
contradiction that establishes Lemma 4.1. ■ 



Remark 4.3. The same arguments in fact give a subgraph removal lemma, in which 
the triangle is replaced by another fixed subgraph. The proof is the same, it is 
only the downset imax (and some minor numerical factors in the argument) which 
change significantly. But in all these cases, the elements in the downset will only 
have cardinality at most two. We will not give the details here since they will be 
subsumed by the hypergraph removal lemma in Theorem 8.1. The higher order 
cases of Theorem 4.2, involving sets e of three or more elements, do not actually 
get used in graph theory (which is ultimately only concerned with finite boolean 
combinations of relations that involve at most two vertices at a time), and are only 
of importance for hypergraph theory (in which one must now consider combinations 
of relations, each of which involve three or more vertices). 



As observed in [30] , there is a slightly stronger version of the triangle removal lemma 
which gives some further complexity information on G', at the expense of conceding 
that G' need not be a subgraph of G. More precisely, we have 

Lemma 4.4 (Strong triangle removal lemma). Let G = {V,E) be an undirected 
graph with \V\ = n vertices. Suppose that G contains fewer than 6n^ triangles 

for some < S < I. Then one can find a triangle-free graph G' with G\G' con- 
taining fewer than os^o{n^) edges. Furthermore, there exists a partition ofV into 
0^(1) components, such that when restricted to the edges joining any two of these 
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partitions (which could be equal), then G' is either a complete graph or an empty 
graph. 

This stronger version of the lemma is a by-product of the usual proof of Lemma 
4.1, as the graph G' is constructed by excluding certain bad pairs of Szemeredi 
cells from the graph G. It turns out that the infinitary approach can also yield this 
stronger lemma without much difficulty. 

Proof We again argue by contradiction. But this time, the contradiction hypoth- 
esis yields a more complicated statement. More precisely, if Lemma 4.4 failed, then 
we can find an < ry < 1 and sequence n^"*) of integers, a sequence of graphs 
Q{m) ^ (y(TO)^^(m)) ^i^j^ |y(m)| ^ ^(,»), and a sequence M^™) tending to infinity 

as m ^ oo, such that (6) holds, but such that there does not exist any triangle-free 
graph G' for which G\G' has fewer than ?7(n(™^)^ edges, and for which there exists 
a partition of into M^™^ or fewer components, such that when restricted to 

the edges joining any two of these cells, G' is either the complete graph or the 
empty graph. 

We now repeat all the arguments used to prove Lemma 4.1, imtil we get to the 
point where we have created regular events F- ^ G ■B{i,j,4,5,... } obeying (10) and 
(11). Now wc insert an additional step to lower the complexity of the events Flj. 
Observe that <B{,j ^ 4 5 j is generated by the factor 'B{i.4,5,... } VyB{j 4 5 j, together 
with the additional event Aij. Thus we can write F^j A Aij = F^j A Aij for some 
F!'j e S{i,4,5,... } V B{j,4_5,... }. From (10) we have 

P^°°\Aij\Fl'j) < rj/10 for ij = 12,23,31. 

Now we argue that we still have the analogue of (11), namely 

K2^K^^Ki = ^- (12) 

Prom (11) we already have 

(^M A i^^^a A Fl^) n (^1,2 A ^2,3 A ^3,1) = 0- (13) 

But the regular event F'^'^ A Fl^^ A F^^ is a boolean combination of finitely many 
events Ajj, where at most one of the i,j lie in {1,2,3}. In other words, this 
combination does not involve ^1,2 A ^2,3 A ^3^1. Thus if (12) failed, so that there 
was an infinite graph (N, _Eoo) lying in -F"i"2 A Fl^^ A F!^ ^, wc could modify the 
graph on the edges {1, 2}, {2, 3}, {3, 1} so that it also lies in the set in (13), a 
contradiction. 

To summarise, we can safely replace F-^^ by the lower complexity event F-'^. Now 
we continue the argument in the proof of Lemma 4.1 with this replacement, but 
define the edges of (G")'^™^ to be all the good pairs {x,y} in T/^™) rather than in 
This means that (G')^'"^ is no longer a subgraph of (G)^'"^ but the property 
of being good is determined entirely by the regular events F-'- , which in turn only 
involve finitely many events Aij with at most one of the i,j lying in {1,2,3}. 
Inspecting the definition of a good pair, we see that for fixed x"^\ . . . , ' for N 
sufficiently large, the graph (G')*^"') has bounded complexity, in the sense that there 
is a partition of F'"*) into M cells, for some M depending only on the F/j , such that 
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when restricted to the edges joining any two of these cells, G' is either the complete 
graph or the empty graph. But for m sufficiently large we have M^™) > M, and so 
we attain the same contradiction as before. ■ 



5. The uniform intersection property 



We now build the machinery necessary to prove the infinitary hypergraph removal 

lemma (Theorem 4.2). Again, we will be motivated by the example from ergodic 
theory. In Furstenberg's proof [7], [10], [8] of the Furstenberg recurrence theorem 
(Theorem 2.2), the proof proceeded by a kind of induction on factors. Let us say 
that a factor B of Bmax obeys the uniform multiple recurrence (UMR) property if 
the conclusion of Theorem 2.2 holds true whenever E & B and P{E) > 0. Thus 
for instance the trivial factor {0, 0} has the UMR property. One then shows that 
the UMR property is preserved under three operations: weakly mixing extensions; 
limits of chains; and compact (or finite rank) extensions. An application of Zorn's 
lemma^^ then allows one to conclude that the maximal factor Bmax obeys the UMR 
property. 

We will adopt a similar strategy here, based around a certain property of families 

of factors which we call the uniform intersection property (UIP). This property is 
again trivial for very small families, and will be preserved under the same three 
operations of weakly mixing extensions, limits of chains, and finite rank extensions. 
Because of the finiteness of J in Theorem 4.2, we will only need to apply these 
operations finitely often, and will not require Zorn's lemma. However it does seem 
likely that there are extensions of this theorem to the case when J is infinite, and 
(more interestingly) to the case where the sets e in i^ax can be unbounded or even 
countably infinite. We will not pursue this matter here. 

We begin by stating the UIP. 

Definition 5.1 (Uniform intersection property). Let (fJ, i3,nax, P) be a probability 
space, and let B^eg be an algebra in Bmax- We say that a tuple {Bi)i^i of factors 
has the uniform intersection property (UIP) if the following holds: given any tuple 
{Ei)i^i of events Ei G Bi with P{/\^^j Ei) = 0, and given any e > 0, there exists 
a tuple {Fi)i(zi of regular events Fi G Bi A Breg with P{Ei\Fi) < e for each i G I 
such that /\^^J Fi = 0. 

Remark 5.2. Roughly speaking, the UIP asserts that if events Ei from separate 
factors Bi have a null intersection, then this fact can be almost entirely "explained" 
by regular events Fi G Bi which have empty intersection. Thus, for instance, the 
conclusion of Theorem 4.2 is simply that the tuple {Bi)i^i^^^ obeys the UIP. 



14a ctually, to establish Theorem 2.2 for a fixed k, one only needs to apply the limits-of- 
chains step a finite number of times depending on k, at which point one reaches a factor which 
is characteristic for the maximal factor Bmax, at which point one can jump directly to Bmax 
without using Zorn's lemma. Thus the proof of the Furstenberg recurrence theorem does not 
actually require the axiom of choice, and indeed the original proof in [7] did not use this axiom. 
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Before continuing, let us illustrate the UIP with a few simple examples. All of these 
examples take place in some probability space (fi, Bmax, P) with an algebra Breg. of 
regular events. We say that a factor B is regularisable if it can be generated by at 
most countably many regular events. 

Example 5.3. The empty tuple () obeys the UIP in a vacuous sense (the hypothesis 
P(Aie7 ^i) = is impossible to satisfy). 

Example 5.4. Let S be a factor. Then the singleton tuple {B) trivially has the UIP 
(indeed one can even take e = and Fg = in this case). 

Example 5.5. Let B be a regularisable factor. Then the 2-tuple {B, B) has the UIP. 
Indeed, if E,E' & B were such that P{E A E') = 0, then from Lemma A. 19 one 
can find regular events F,F' £ B which are e/3-close to E, E' respectively. By the 
triangle inequality this impHcs that P(F A F') < 2e/3. If we set F := F\F' and 
F' := F'\F then we see that F, F' are regular events in B with F A F' = ^, while 
from the triangle inequality P(E\F),P{E'\F') < s, and the claim follows. For a 
generalization of this argument, see Lemma 5.11 below. 

Example 5.6. The trivial factor {0, il} has no impact on the UIP. More precisely, 
a tuple {Bi)i^i obeys the UIP if and only if {Bi)i^i l±) ({0,f2}) also obeys the UIP, 
where we use W to denote the concatenation of tuples. 

Example 5.7. Let {B\,... ,Bi) be a tuple of jointly independent factors. Then 

{Bi, . . . ,Bi) has the UIP. Indeed, if Ej € Bj then by joint independence we have 

F{Ai<j<iE,) = n;=iP(^i)- Thus if A i<j<i Ej IS a null event, then one of the 

Ej, say Ej^, must also be a null event. The claim then follows by letting Fj^ = 
and letting all the other Fj be the full event O. For a more sophisticated version 
of this argument, see Lemma 5.12 below. 

Example 5.8. Let Q, be the unit interval [0, 1] with Lebesgue measure, let B^cg 
consist of all the finite unions of intervals (open, closed, or half-open), let Bi be 
the factor generated by the event Ei := [0, 1/2], and let B2 be the factor generated 
by the event E2 '■— [1/2, 1]. Then {Bi,B2) does not have the UIP. However if one 
modifies B2 to be the factor generated by (1/2, 1] instead of [1/2, 1], then the UIP 
is restored. Thus the UIP is sensitive to modification of the underlying factors by 
null events. (On the other hand, the events Ei themselves can be modified by null 
events within Bi without any impact to the UIP.) 

Example 5.9. Let Bi,B2 be finite cr-algebras, and let B be another cr-algebra, such 
that {Bi,B2,B) has the UIP. Then {Bi W B2,B) also has the UIP. To see this, let 
E12 € BiV B2 and E e B be such that P(Fi2 V B) = 0. Since Bi and B2 are 
finite, we can write E12 as the union of M events of the form Ei^m A -E2,m for 
1 < m < M for some finite M and some events Ei^m G Bi,E2^m G B2- From the 
UIP hypothesis we can find regular events Fi „i G Bi, F2.m G B2, Fm € B with 
Fi,m A F2,m AF^ = % and P(Fi,„\Fi,„), P(i;2,„\F2,„), P(F\F^) < e/M. If we 
then set F12 ■= Vm=i(-^i.™ ^-^2,m) and F := Am=i Em then the claim follows. For 
a generalization of this argument, see Lemma 5.13 below. 

Remark 5.10. If {Bi)i^j has the UIP, then given any tuple (Ei)i^j of events Ei G Bi 
such that Aie/ Ei is a null event, then there exists a tuple {Gi)i^i of null events 
Gi G Bi which cover the null event l\i^jEi. This follows from applying the UIP 
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with e = 2^" (say) to obtain events {Fi_n)iei with P{Ei\Fi^n) < 2^" for all i G J 
and Aie/ ~ ^' ^^"^ then letting d be the event Ei holds, but that Fi^„ fails 
for infinitely many n > 1; the claim Ei C \l^^jGi then follows from the 
pigeonhole principle, while the claim that Gi is null follows from the Borel-Cantelli 
lemma. Unfortunately the Gi will in general not be regular, and so this consequence 
of the UIP, while simple to state, is not useful for applications. 

We now develop the general tools that we shall use to deduce the UIP for complex 
tuples from the UIP for simpler tuples. We first show that repetitions do not 
affect the UIP so long as the cr-algebra being repeated is regularisable. We use 
{Bi)i^i l±l {Bj)jQj = {Bi)i^iii)j to denote the concatentation of two tuples, where 
/ l±l J is the disjoint union of / and J (thus one may have to relabel the index set 
of 7 or J in order to define this concatenation). 

Lemma 5.11. Let {Bi)i,=i be a tuple of a-algehras, and let B he a regularisable 
a-algebra. Then {Bi)iei W [B) has the UIP if and only if {Bi)i^i ttl {B,B). 

Proof First suppose that {Bi)i<.i W (S, B) has the UIP. Then if Ei € Bi and E € B 

arc such that P{E A A;e/ ^i) = then by the UIP hypothesis (inserting a dummy 
event Q for the extra copy of B) we can find regular events Fi G Bi and F' , F" G B 
such that F' A F" A A^e/ Fi = ^ and 

P{E,\Fi),P{E\F'),V{W) < e/2 for all i G /. 

The claim then follows by setting F := F' A F" . 

Now suppose conversely that (Bi)i^i W (B) has the UIP. Then if Ei G Bi and 
E,E' e B are such that P{E A E' A /\i^i Ei) = then by the UIP hypothesis 
(replacing E and E' by the single event E AE') one can find regular events Fi G Bi 
and F eB such that F A Fi = $ and 

P{Ei\Fi),-p{{E A E')\F) < e/3 for all i G /. 

Now since B is regularisable, we see from Lemma A. 19 that every event in B is 
e-close to a regular event in B for any £ > 0. In particular, we can find regular 
events E,E' G B which are e/3-close to E and E' respectively. If one then sets 
F := {E\E') V F and F' := {E'\E) V F, we see from the triangle inequality that 
I'{E\F),P{E'\F') < e, and that F AF' A Ajgj- Fi = 0, and the claim follows. ■ 

Now we give the three major extendability properties of the UIP, under weakly 
mixing extensions, finite rank extensions, and limits of chains. We begin with the 
analogue of the weakly mixing extension property, which says that one can extend 
any member of a tuple without destroying the UIP, as long as the extension is 
relatively independent of all the other factors in the tuple. 

Lemma 5.12 (Weakly mixing extensions). Let {Bi)i^i be a tuple of a -algebras, 

and let B be an additional cr-algebra such that {Bi)i^i W {B) obeys the UIP. Let 
B' be a extension of B which is relatively independent ofy^^jBi over B. Then 
{Bi)iei W (B') also obeys the UIP. 
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Proof Let e B^ tor i e I and E' e B' be events such that P{E' A Aie/ Ei) = 0. 
We rewrite this as 

iei 

The first factor is measurable in B', while the second factor is measurable in\/-^jBi. 
Since these two cr- algebras arc relatively independent over B, we may use (16) and 
conclude that P{E'\B) Hie/ = almost surely. Let E G B he the support of 
P{E'\B) (which is determined only up to a null event in B), then E A Aig/ is 
a null event. Applying the UIP hypothesis, wo can find regular events Fi G Bi for 
i e / and a regular event F € B such that P{Ei\F^), P{E\F) < s and 

Fvy Fi=(/). 

We then set F' := F. We will be done as soon as we check that P{E'\F') < s, 
which will follow if we can show that P{E'\E) = 0. But 

p{E'\E) = p(i(£')(i - m)) = p(PiE'\B){i - m)) 

since 1 — 1{E) is B-measurable. But this vanishes by the definition of E'. ■ 



Next we turn to the preservation of the UIP under compact extensions (or more 
accurately "finite rank extensions"), which assert that one can extend any given 
element of a tuple by finite factors of other elements in the tuple (destroying those 

elements in the process). 

Lemma 5.13 (Finite rank extensions). Let Bq he a a-algebra, let Bi,... ,Bi be 
factors of Bo, and let B[, ... , S,' be finite a-algebras for some I > 1. Let {Biji^i be 
an additional tuple of a-algebras. Then if {Bi)i^i tt) {Bi \/ B'l, . . . ,BiV B[, Bq) has 
the ULP, then {Bi)iei W (So V V ... V S;') also has the UIP. 



Proof Write B^ := BqV B[V . . .V B'l. Let E^ be an event in B*, and let Ei G Bi 
for i G 7, be such that P{E^ A /\^^jEi) = 0. Since B[, . . . ,B[ are finite, we can 
write E^ as the finite union of events 

Ejf = Eif^i V ... V Ejf^M 

for some M > 1, where each £^*,m, has the form 

E*^m = Eo^m A Ei^m A ... A El^m 

for some events Eo^m G Bq and Ej^m G B'j for 1 < j < Z. For each 1 < m < M, we 
have P(iJ*,m A Aie/ Ei) = and hence 

PiEo,m A i^i,™ A . . . A A /\ ^i) = 0. 

Observe that Ej^m G Bj\/B'j for 1 < j < /, and hence by the UIP hypothesis we may 
find regular events Fo,m S Bq, -Fj,^ € Bj V B'^ C for 1 < j < I, and Fi^^ G Bi 
for i G / such that 

£ 



P{Eo,m\Fo,m),P{Ej,m\Ej,m), P(-E^i\-Fi,m) < 



M{1 + 1) 
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for 1 < j < ^ and i G /, and 

I 

] = 1 i€l 

Thus if we set F* := Vm=i(Aj=i Pj,m A i^o,m) and Fi := Am=i -Fi.m for i G / then 
-F* e S» and Fi G are regular events, and A F^ = 0, and 



= £ 



TO=1 J=l ^ ' 



and 

M 

P(^i\i^i) < 1] P{Ei\hm) < M ^ < £ 

and the claim follows. 



Finally, we consider the preservation of the UIP under limits of chains assuming a 
certain relative independence property. 

Lemma 5.14 (Limits of chains). Let A be a totally ordered set, let I be a finite 

index set, and for each a <E A let {Ba.i)iei o, tuple of a-algebras obeying the 
UIP, which is increasing in the sense that Ba,i is a factor of Bfj^i whenever a < (3 
and i G I. Let Bi := y^eA ^a,i, and suppose that whenever i £ I and a G A, 
the a-algebras Bi and \/jer\{i} ^aJ ^''^^ relatively independent over Ba,i- Then the 
tuple {Bi)iQi also obeys the UIP. 



Proof Let Ei G Si for z G 7 be such that P(A,g/Fi) = 0. Prom Corollary A.20, 
we see that for each z G / there exists an a G A such that Ei is £/4(|/| + l)^-close to 
an event in Ba.i- Since there are only finitely many i, we can make this a uniform 
in i. This implies in particular that \\l{Ei) - P(F,|S„,i)||L2 < e'^/'^ /2{\I\ + 1) for 
all i G /, since the orthogonal projection P(Fj|Ba,i) is the nearest B^.j-measurable 
random variable to I(-Ej) in the L^ metric. 

Let Ea^i G Ba,i denote the event that P(Fj|Ba,i) > (this event is only defined 
up to null events in B^c^i)- Then by Chebyshev's inequality we have 

P{Ei\E^,i) < P [\l{Ei) - nEi\Bo.,i)\ > 
< e/2 

for each i G /. Now let A denote the event A := Aie/ -^a,i- Since P(Aie/ Ei) = 0, 
we have 

P{A) <^P{A\E,). (14) 
iei 

On the other hand, we have 

P{A\Ei)=E{I{Eo,,i\Ei) H I{E^j)). 
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Using the relative independence hypothesis, we conclude 

P{A\Ei) = E{P{E^,i\Ei\Ba,,i) H I{E„,,)). 

But by definition of E^^i we have 

P{E^,i\Ei\Bc,,i) = I{Eo,,i){l - P(£i|B„,i)) < j^^I{E^,i) 

and hence 

P(A\i?,) < T|^E(I(E„,0 n ^Ea,,)) = ^PiA). 

Inserting this back into (14) we conclude that P{A) < jjj^P{A), and hence that A 
is a null event. By definition of A and the UIP hypothesis, we may thus find regular 
events Fa,t <E Ba,i for alH e / with P{Ea,i\Fa,i) < s/2 such that Fa^i = 0. 
By the triangle inequality we thus have P{Ei\Fa^i} < e/2. The claim now follows 
by setting Fj := F^i- ■ 

6. Proof of the infinitary hypergraph removal lemma 

We are now ready to prove the hypergraph removal lemma. Fix the probability 
space (rj, ;Bniax, P), the algebra Breg of regular events, the finite set, J, the downset 
imax, and the factors Bj for I G imax obeying the hypotheses in Theorem 4.2. For 
any sub-ideal i of imax, let B(i) denote the factor B{i) := \l j^-^Bi, thus B{i) is a 
regularisable factor. For any / € imax, define the principal ideal (J) := {/' : /' C /}; 
from the nesting property we see that B{{I)) = Bi for all / € imax- Thus our task 
is to show that the tuple obeys the UIP. For inductive purposes, we 

will derive this claim from the following more general statement. For any downset 
i, we define the height h{i) of i to be the quantity /i(i) := sup{|e| : e € i}, with the 
convention that the empty ideal has height — oo. 

Proposition 6.1. Let the hypotheses and notation he as above. Let d>0, and let 

{ii)iei any finite tuple of sub-ideals of imax (possibly with repetitions), such that 
every ideal ij has height at most d. Then the tuple (S(ii))jg/ obeys the UIP. 

By taking d sufficiently large (e.g. d = \ J\) we obtain Theorem 4.2. 

Proof We will prove Proposition 6.1 by an induction on d. First consider the base 
case d = 0. Then the only ideals available are the empty ideal {}, and the singleton 
ideal {0}; these correspond to the trivial factor {0, J7} and the regularisable factor 
B^. The claim now follows from Examples 5.3, 5.4, 5.6 and Lemma 5.11. 

Now suppose that d > I, and that Proposition 6.1 has already been proven for 
d — 1. First observe from Lemma 5.11 that we may remove duplicates and assume 
that all the ideals ij are distinct. 

Given any e G imax with |e| = d, we know that Be is regularisable, hence we 
may write Be = Vn>i ^e,n for some increasing sequence Be,i C Be,2 C ... of 
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regularisable finite cr-algebras. In particular, we have B(ii) — Vn>i'^"(^») ^'-'^ 
i £ I, where 

B„(i):= V Se,„VB(i) 

e^i:\e\—d 

and i is the downset i:={eGi:|e'|<rf}; note that this ideal has height strictly 
less than d. 

We need some relative independence properties of the factors We begin with 

Lemma 6.2. Let i,i' be sub-ideals o/imax height at most d which do not have any 
common elements of order exactly d. Then B{i) and S(i') are relatively independent 
over B{i). 

Proof We will induct on the quantity m := |{e G i : |e| = d}\, the number of 
top-order elements in i. If m = then i = i and the claim follows. Now suppose 
that m > 1 and the claim has already been established for m — 1. Let Cd be an 
element of i with |ed| = d, and let i := i\{ed}. From the induction hypothesis we 
already know that B{i) and B(i') are relatively independent over B(i). Also, from 
the UIP hypothesis we know that B{{ed)) and B{\) VB(i') are relatively independent 
over B{i). Applying the gluing property (Proposition A.27(i)) we conclude that the 
factors B{\) V B{{ed)) and S(i') are relatively independent over Since the 

former factor is nothing more than B{\), the claim follows. ■ 

As a consequence, we have 

Lemma 6.3. Let i £ I and n > 1. Then B{ii) and \/ j^j-j^^ Bn{i-j) are relatively 
independent over Bn{U)- 

Proof Observe that \/j^j.j^iBn{ij) is a factor of Bn{ii) V B{i'), where i' is the 
downset i' := {{Jj^j ij)\{p- & U '■ |e| = d}. Thus by monotonicity and absorption 
(Proposition A.27(i), (ii)) it suffices to show that B{ii) and B{i') are relatively inde- 
pendent over Bn{U)- Since factors do not affect relative independence (Proposition 
A.27(iv)), it suffices to show that B{ii) and S(i') are relatively independent over 
B{ii). But this follows from Lemma 6.2. ■ 

From the above lemma and Lemma 5.14, we sec that to close the induction hypoth- 
esis it suffices to show that {Bn{ii))iei obeys the UIP for all n > 1. 

Let k denote the number of ideals whose height is exactly d. First suppose that 
all the ideals ij have height strictly less than d. Then S„(ij) = S(ij), and the claim 
follows from the induction hypothesis. 

Now suppose that all the ideals ii either have height strictly less than d, or are 
principal ideals (this is a "weakly mixing" case). We induct on the number of 
principal ideals of height d. If there are no such ideals, then we are done by the 
preceding paragraph. Since we have removed duplicates, we know that no two 
principal ideals present have any common elements of top order d. Thus if U is a 
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principal ideal, then Bn{ii) is relatively independent of \/ j^j.j^^ Bn{ij) over B(ii). 
Applying Lemma 5.12, it suffices for the purposes of checking the UIP to replace 
with S„(ij). But this follows from the (inner) induction hypothesis. 

Finally, we consider the general case. Let k denote the number of ideals ij of height 
d which are not principal. We have already dealt with the case fc = 0, so suppose 
inductively that k > I and the claim has already been proven for fc — 1. Let Let ij^ 
be an ideal of height d which is not principal, and let ei, . . . ,ei be the elements of 
ijo of order d. We can then split 

Bniko) = Bn{{ei)) V ... V Bn{{ei)) V B{U). 

Also observe that for 1 < j < / we have Bn({ej)) = B^^.n'^ B{{ej)), and that B{{ej)) 
is a factor of B{ii). Thus we may apply Lemma 5.13 and conclude that in order to 
prove the UIP for {Bn{ii))iei, it suffices to do so for the tuple 

{Bn{ii))iei\{i,} W (B„((ei)), . . . , Bn{{ei)), B(\{)). 

This tuple has one fewer non-principal degree d ideal than the original tuple, and 
so the claim now follows from the (inner) induction hypothesis. ■ 



7. A HYPERGRAPH CORRESPONDENCE PRINCIPLE 

We now generalise the graph correspondence principle developed in Section 3 to 
hypergraphs. To keep the exposition somewhat simple we shall restrict our atten- 
tion to the principle for a single rf-uniform hypergraphs, although there would be 
no difficulty extending this principle to systems of hypergraphs of varying unifor- 
mities and partite- ness. The material here will be extremely analogous to Section 
3. Indeed, we could have deleted that section as being redundant, but we believe 
for pedagogical purposes that it is better to start with graphs before moving on to 
hypergraphs. 

Definition 7.1 (Hypergraphs). Let d > 0. If F is a set, we let (^) := {e C 

V : \e\ = d] denote the d-clcmcnt subsets of V . A d-uniform hypergraph is a pair 
G = {V, E), where ^ is a non-empty set and E C (^) . 

Note that a 2-uniform hypergraph is the same concept as an undirected graph. We 
will fix d > 2, and consider the question of extracting an infinitary limit from a 
sequence G'™' — (y('"), ii;('")) of d-uniform hypergraphs. As before, we shall need 
a universal space, an embedding into that space, and a correspondence principle. 
We begin with the universal space. 

Definition 7.2 (Hypergraph universal space). Fix d > 2. Let O := 2^) = 

{(N, i?oo) : Eao C C^)} denote the space of all infinite d-uniform hypergraphs 
(N,Eao) on the natural numbers. On this space fl, we introduce the events Ag for 
all e e (^) by Ag := {(N,iJoo) G f2 : e e E^}, and let Bmax be the cr-algebra 
generated by the A^. We also introduce the regular algebra B^cg generated by the 
Ae, thus these are the events that depend only only finitely many of the Ag. For 
any a G Soo, we define the associated action on Smax by mapping a : A^ ^(7(e) 
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and extending this to a ci-algebra isomorphism in the unique manner. For any 
(possibly infinite) subset / of N, we define Bi to be the factor of Bmax generated 
by the events Ag for e e (^) . 

Next, wc need a way to embed every finite hypergraph into the universal space. 

Definition 7.3 (Hypergraph universal embedding). Fix d > 2. Let m > 1, and 
let G^™) = (F(™),i;(™)) be a finite rf-uniform hypergraph. Let (fif™), S^^, P(™)) 
be the probability space corresponding to the sampling of a countable sequence of 
iid random variables x^^\xi)^\ . . . S V^^™) sampled independently and uniformly 
at random. To every sequence {xi'^\x^\ . . .) G Jl^™) we associate an infinite 
d-uniform hypergraph = (N,iJ^') e f2 by setting 

:={ee(^):{xf^*€e}ei;W}. 

This mapping from fi^™) to is clearly measurable, since the inverse images of the 
generating events Ag G (^l,Bmax) arc the events that {xl™-* : i e e} lie in G^"^\ 
which are certainly measurable in B^l_. This allows us to extend the probability 
measure P(™^ from (B^^, fi^™)) to the product space (Smax x B^l, Q x O^™)) in a 
canonical manner, identifying the events A^, with the events : « G e} € E^'^K 

We shall abuse notation and refer to the extended measure also as P^"*). 

As before we can verify the permutation invariance (4). By repeating the proof 
of the graph correspondence principle (Proposition 3.4) almost word-for-word, we 

obtain its counterpart for hypergraphs: 

Theorem 7.4 (Hypergraph correspondence principle). Fix d>2. For every m > 
1, let G^"^ = (l/(™) , £'(™) ) be a finite d-uniform hypergraph, and let P^™) he as in 
Definition 7.3. Then there exists a subsequence < mi < m2 < . . . of m, and a 
probability measure p(°°) on the hypergraph universal space (^^,Smax), such that we 
have the weak convergence property (1) and the permutation invariance property 
(5). Furtherm,ore, we have the following relative independence property: for any 
. . . , /; G N with I n Ii n . . . n Ii infinite, the factors Bj and Vi=i '^''^ 
relatively independent conditioning on Bmia with respect to this probability 
measure p(°°) . 

Similarly, by repeating the proof of Lemma 3.5 almost word for word we obtain 

Lemma 7.5 (Infinitary hypergraph regularity lemma). Fix d > 2, and let F be a 
probability measure on the hypergraph universal space {fl, Smax) which is permutation- 
invariant in the sense of (5) . Then for any /, /i , . . . , /; e N with / fl /i fl . . . fl 7; 
infinite, the factors Bj and Vi=i ^it '^'"'^ relatively independent conditioning on 

Remark 7.6. We should emphasise just how easily the regularity lemma has ex- 
tended to the hypergraph case here. This is contrast with the development of the 
finitary hypergraph regularity lemma, which has only been satisfactorily achieved 
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quite recently [19], [20], [12], [30] (with preliminary work in [5], [3], [6]). In the 
author's view this is because the regularity lemma is a relatively "soft" component 
of the theory; in the infinitary framework, the "hard" components of the theory are 
now isolated in the three fundamental extension properties in Lemma 5.12, Lemma 
5.13, Lemma 5.14 (and to a lesser extent in Lemma 5.11). These three lemmas are 
roughly analogous to the "counting lemma" components of the hypergraph theory 
(although Lemma 5.14 also captures some of the nature of the "regularity lemma" 
component, and is the step which is most responsible for the extremely poor quan- 
titative bounds in this theory). Unsurprisingly, it is also these three lemmas where 
one does the most non-trivial manipulation of small quantities such as e. Fortu- 
nately, the infinitary setting allows one to isolate these epsilons from each other, 
despite the fact that all three of these basic lemmas are used repeatedly in the 
proof of the infinitary hypergraph removal lemma (Theorem 4.2). If instead we 
expanded out all of these lemmas within the proof of Theorem 4.2, and allowed the 
various epsilons to mix together (with the order of quantifiers, etc. being carefully 
recorded), one would eventually end up with a complicated situation roughly anal- 
ogous to those in the finitary proofs [19], [20], [22], [23], [12], [30] of the hypergraph 
removal lemma. Thus the infinitary perspective allows for a powerful encapsula- 
tion of distinct components of the argument which greatly cleans up and clarifies 
the high-level structure of the proof, though the low-level components are, at a 
fundamental level, essentially the same as in the finitary approach. 

8. An INFINITARY PROOF OF THE HYPERGRAPH REMOVAL LEMMA 

We can now repeat the arguments from Section 4 to obtain the following triangle- 
removal lemma of Nagle, Schacht, Rodl, and Skokan [19], [20], [22], [23] (and inde- 
pendently by Gowcrs [12]; sec also [30] for a later proof): 

Theorem 8.1 (Hypergraph removal lemma). Fix d>2, and let Go = {Vo,Eo) be 
a d-uniform hypergraph. Let G = iV^E) be a d-uniform hypergraph with |y| = n 
vertices. Suppose thatG contains fewer than 6n^^°^ copies of Gq for someQ <5 <1, 

or more precisely 

\{{xi)iev^ ^V^" -.{xi-.i&e} &E for all e G Eq}\ < Sn^^°l 

Then it is possible to delete os^o;Go,d{'n'^) edges from G to create a d-uniform hy- 
pergraph G' which has no copies of Gq whatsoever. Here the subscripting of the o() 
notation by Go,d indicates that the quantity 05^o;Go('T''')) when divided by n'^ , goes 
to zero as 5 — > for each fixed Go, d, but the decay rate is not uniform in Go, d. 

Remark 8.2. As with the triangle removal lemma, this lemma has previously only 

been proven via a hypergraph regularity lemma, followed by a counting lemma. 
This is rather complicated; the shortest proof known (in [30]) is about 25 pages, 
and requires some quite delicate computations. While this current proof is arguably 
longer than the proof in [30], and certainly less elementary, there are far fewer 
computations involved, and we believe the argument here is more conceptually 
clear. This theorem has a number of applications, most notably in giving a proof 
not only of Szemeredi's theorem (Theorem 2.1) but also a multidimensional version 
due to Furstenberg and Katznelson [9]; see e.g. [23], [12], [31] for further discussion 
of this connection, and [21] for some more applications of this theorem. A variant 
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of this theorem was also used in [31] to establish that the Gaussian primes contain 
arbitrarily shaped constellations; we shall discuss this variant shortly. 

Proof (Sketch) This is basically a repetition of the proof of Lemma 4.1. so we 
sketch the main points only. Fix d, Gq. We can relabel Vq to be {1, . . . ,no} for 
some integer no; we can also easily assume that Eq is non-empty. If the theorem 
failed, we argue much as in the proof of Lemma 4.1, with {1,... ,no} playing 
the role of {1, 2, 3} (and thus {no + 1, no + 2, . . . } playing the role of {4, 5, . . . }). 
We apply the hypcrgraph correspondence principle to obtain an infinitary limiting 
system (il, i3max, P'""-*), and apply Theorem 4.2 with J := {1, . . . , no}, imax '■— {e : 
e C e' for some e' G Vq}, set equal to Ae if e G Vq find E^ = otherwise (the 
latter happens precisely when |e| < d), and with Be set equal to Beu{rio+i.no+2.... } 
for all e G imax. One then continues the argument as in Lemma 4.1 (with the factor 
100 in (8) replaced by at least 2'^d\\E(j\)\ the remainder of the proof proceeds with 
only the obvious minor changes. ■ 

Remark 8.3. These results have analogues for partite hypergraphs (see [30]) and 
arc proven similarly, but we will not do so here; the main difference is that instead 
of sampling all verticc^s from a single vertex class, one samples countably many 
vertices from each vertex class (which also leads to a more complicated symmetry 
group than ^oc). Just as the triangle removal lemma, Lemma 4.1, has a stronger 
version in Lemma 4.4 which gives a complexity bound on the approximating graph 
G' , the hypergraph removal lemma given above also comes with a stronger version, 
in which the approximating hypergraph G" is no longer a subhypergraph of G, 
but can be described using a partition of iy^-^ into 0<5,Go,d(l) components. We 
will neither state nor prove this stronger version here (the proof is much the same 
as Lemma 4.4), but see [30] for an extremely similar statement (in the setting of 
partite hypergraphs rather than non-partite hypergraphs). This version played an 
important role in the result in [31] that the Gaussian primes contained arbitrarily 
shaped constellations. 

Appendix A. Review of probability theory 

In this appendix we review the notation and tools from probability that we shall 
need. There are two concepts here of particular importance: the concept of rel- 
ative independence of two or more factors in a probability space; and the ability 
to approximate complicated events or random variables by combinations of more 
elementary events or random variables. 

A.l. The algebra of events. A probability space has two major structures; the 

set-theoretic structure of its events, and the measure-theoretic structure of the 
probability measure P. Because we will be dealing with multiple event spaces with 
a single probability measure, or multiple probability measures on a single event 
space, it will be conceptually clearer if we treat these two structures separately. We 
begin with the structure of the event spaces. For technical reasons it is convenient 
to restrict attention to countably generated spaces. 
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Definition A. 2 (Event spaces). An event space is a pair (r2,i3max), where the 
sample space fl is a non-empty set (possibly infinite), and Smax is a cr-algebra 
on f2, i.e. a collection of subsets of f2 which are closed under countable unions, 
intersections, and complements, and which contains the empty set and ft. We 
will also require that the cr-algebra B^ax be countahly generated, thus there exists 
a countable sequence of events Ei,E2, ■ ■ ■ G Smax such that ^Bmax is the minimal 
cr-algebra containing all these events. We refer to elements of Smax as (measurable) 
events; we abuse notation and identify properties P{x) of points a; G with the 
associated event {x € Q : P{x) is true}, and refer to the event simply as P. If A 
and B are events, we use A\/ B to denote the event that at least one of A and B 
are true (i.e. ^ V i? is the union of A and B) and A A B to denote the event that 
A and B are not true (i.e. ^ A i? is the intersection of A and B). We also use A to 
denote the event that A is not true (thus A = n\A). 

Example A. 3. If O is at most countable, the power-set event space (fi, 2^^) of a set 
SI is achieved by setting Bmax := 2^ := {E : E C V,} to be the power set of fl. (If 
r2 is uncountable, 2^ is no longer countably generated.) 

Definition A. 4 (Factors). Let {Sl,Bynax) be an event space. A factor is a subset 
B of Bmax which is also a countably generated cr-algebra. More generally, we say 
that Bi is a factor of B2 (or B2 extends Bi) if Bi,B2 are both cr-algebras in Bmax 
and Bi C S2. We say that a factor is finite if it consists of only finitely many 
events, thus for instance the trivial factor {0, fl} is finite. An event is B-measurable 
if it lies in B. A random, variable is any function / : SI ^ R with the property 
that the events f GV arc Bmax-measurable for all open sets V; if these events are 
in fact S-measurable, we say that the random variable / is B-measurable also. In 
particular, if an event E is B -measurable, then its indicator variable 1{E), defined 
to equal 1 when E is true and otherwise, is also S-measurable. It £ C Smax is any 
collection of events, we let B[£] denote the factor generated by these events (i.e. the 
intersection of all factors that contain £). In particular, if is a single event, we 
let B[E] = {^,E,E,fl} denote the (finite) factor generated by E. Similarly, if X 
is a random variable taking finitely many values, we use B[X] to denote the factor 
generated by the events X = c, where c ranges over the range of X. We write 
Bi V B2 for B[Bi U B2], thus Bi V B2 is the least common extension of Bi and B2- 
More generally, we can define the least common extension VaeA of any at most 
countable collection of factors Ba- 

Example A. 5 (Finite factors). Let Ai, . . . , A„ be a partition of the sample space 
n into disjoint non-empty events. Then B = B[Ai,... ,An] is the finite factor 
consisting of all events which are the union of zero or more of the Ai (and all finite 
factors are of this form). We refer to Ai as the atoms of B. Let i : fl ^ {1, . . . ,n} 
be the random variable which indexes which atom one lies in, thus x € for 
all X e fl. A random variable / is B-measurable if and only if it is determined by 
i, thus f(x) = F{i{x)) for some function F : {1, . . . , n} ^ R. One finite factor Bi 
extends another B2 if the partition into Bi-atoms is finer than the partition into 
B2-atoms (thus every B2-atom is the union of Bi-atoms). 

We shall also need the notion of a (boolean) algebra, namely a subset B of Bmax 
which is closed under finite intersections, unions, complements, and contains 
and fl. Thus every factor is an algebra, but not conversely. The reason we need 
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to deal with algebras rather than factors is because of the observation that the 
algebra generated by a countable sequence of events remains countable (indeed it is 
nothing more than the collection of finite boolean combinations of events from that 
sequence) , whereas the factor generated by the same sequence can be uncountable. 
This is important when applying the Arzela-Ascoli diagonalisation argument (see 
Lemma A. 15 below). 

Example A. 6. Let = [0, 1)^, and let -Bmax be the Borel tr-algebra (i.e. the algebra 
generated by the open sets). Let Sreg be the space of elementary sets, defined as 
the finite unions of half-open rectangles [a, b) x [c, d) where a, 6, c, d are rational. 
Then Sreg is an algebra but not a factor, and is countable; furthermore yBmax is 
generated by Breg. 

A. 7. Probability spaces. We now add the structure of a probability measure to 

an event space, to form a probability space. 

Definition A. 8 (Probability spaces). A probability space is a triplet (il, Bmax, P), 
where (O, Smax) is an event space, and P : Bmax — * [0, 1] is a probability measure, 
i.e. a countably additive non- negative measure on Bmax with P(n) = 1. A null 
event is an event of probability zero. A statement is true almost surely if it is only 
false on a null event. 

Remark A. 9. We do not assume our event space (0,Binax) to be complete. Thus, 
it is not necessarily the case that any subset of a null event is still a measurable 
event. (It may help to think of the cr-algebras here as being like Borel cr-algebras 
— that is, algebras generated by open sets — rather than Lebesgue cr-algebras.) 

In the remainder of this appendix we assume that the probability space (fl, Bmax, P) 
is fixed. 

Definition A. 10 (Random variables). We consider two random variables equiv- 
alent if they are almost surely equal. If / is absolutely integrable, we use E(/) 
to denote the integral of / with respect to the probability measure P, and write 

ll/llii = ll/llLi(e„„„;P) for E(|/|). Thus for instance E(I(£;)) = P(£:) for any event 
E. Similarly, we write ||/||l2 = ||/||L2(g^_^_^^.p) for E(j/|^)^/^ whenever / is square- 
integrable, and = ||/||L~(B„ax;P) fo^ ^he essential supremum of /. We will 

drop the measure P, and sometimes the factor Bmax, from the i^(Bmax; P) notation 
when these are clear from context. 

It will be important to develop relative versions of all these concepts with respect 

to factors of Bmax- 

Definition A. 11 (Conditional expectation). Up = 1, 2, oc and B is a factor, we let 
LP{B) = LP{B;P) denote the space of B-measurable random variables with finite 
LP norm (identifying two random variables if they are equivalent). Observe that 
L'^{B) is a Hilbert space with inner product (/, g) := E(/g); since B is countably 
generated, we see that L^{B) is separable. We define the conditional expectation 
operator f i— > E(/|B) to be the orthogonal projection from L^(Bmax) to L'^{B); 
note that E(/|B) is only defined up to almost sure equivalence. If E is an event, 
we write P{E\B) for E(I(£')|B), and refer to P(£'|B) as the conditional probability 
of E with respect to the factor B. 
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Wc have the useful 

Lemma A. 12 (Pythagoras' theorem). Let B' be an extension of B. Then for any 
f G L^(Sinax) we have 

||E(/|S')||i. = ||E(/|B)||i. + ||E(/|B') - E(/|B)||i.. 

Proof This follows since E(/|S') is the orthogonal projection to L'^(B'), and 
E(/|S) is the orthogonal projection to the smaller space L'^{B). ■ 

Remark A. 13. In this paper we shall deal almost exclusively with bounded random 
variables (indeed, they will almost always be bounded between —1 and 1). Thus 
issues of integrability will not be a concern to us; this also means that wc do not 
have to distinguish between convergence in L^, convergence in L^, and convergence 
in measure. It will however be crucial to keep track the measurability of our random 
variables with respect to the various factors involved in the argument. 

Example A. 14 (Finite factors). Lot B bo a finite factor with atoms Ai, . . . , A„. If 
/ G -^°°(Smax)) the conditional expectation E(/|Z3) is well-defined on all atoms 
of non-zero probability, and is equal to E,{f\Ai) := E{fl{Ai))/P{Ai)) on each such 
atom. Similarly we have P{E\B) = P{E\Ai) P{E A Ai)/P{A,) on such atoms. 
Of course one can develop similar explicit formulae for the conditional covariance 
of two random variables or events. 



We recall some very standard properties of conditional expectation, that we shall 
use without further comment. The conditional expectation operation / i-^ E(/|S) is 
linear, positivity preserving, and is a contraction on for p = 1,2, oo. In particular 
conditional expectation is continuous in each of the topologies, which allows us 
to easily apply density arguments when verifying identities involving conditional 
expectation (i.e. it suffices to verify such identities for a dense subclass of random 
variables, such as simple random variables). We also have the module property 
that E{fg\B) = /E(3|6) whenever / G L°°(S) and g G L~(S^ax). 

In order to pass from a sequence of finitary objects to an infinitary one, the following 
lemma will be crucial. 

Lemma A. 15 (Arzcla-Ascoli diagonalisation argument). Let P^^^, P^^^, . . . be a 
sequence of probability measures on an event space ($1, S,„ax). Let B^cg be a count- 
able algebra which generates Bmax as a a-algebra. Then there exists a subsequence 
< ki < k2 < . ■ . of integers and a, probability measure P such that 

lim p('=-)(i^) = P{F) for all F G B^e^- 

i—^oo 

In other words, pC^*) is weakly convergent to P, when tested against the algebra of 
events Bres- 



Proof We enumerate Sreg as Fi,F2, . . . (duplicating events if necessary, if B^eg 
happens to be finite). By using the sequential compactness of the unit interval [0, 1] 
(i.e. the Heine-Borel theorem), we can obtain a sequence < ki^2 < ki^s < . . . 
such that P^''^-*\Fi) converges as i — > oo to a limit, say pi G [0, 1]. Then we can 
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extract a subsequence ^2,1 < ^2,2 < ^2,3 < ■•■ of that sequence such that P^'^^-*' (-F2) 
converges as ? ^ 00 to a hmit, say p2 S [0, 1]. We continue in this fashion and then 
extract the diagonal sequence ki := kij to obtain a sequence pi,p2, . ■ ■ G [0, 1] such 
that hnij^oo P'-'^'-'(Fj ) — pj for each j = 1, 2, . . . . One easily verifies that the map 
Fj ^ Pj is finitely additive, non-negative, and maps to and to 1. Invoking the 
Kolmogorov extension theorem (or the Carathcodory extension theorem) we can 
construct a probability measure P such that P{Fj) = pj, and the claim follows. ■ 

Remark A. 16. One can also obtain this lemma from the Banach-Alaoglu theorem 
and the Riesz representation theorem (though one should take care to distinguish 
the notions of compactness and sequential compactness). Observe that both the 
Heine-Borel theorem and the Kolmogorov extension theorem are completely con- 
structive, and so this lemma does not use the axiom of choice. See [32] for further 
discussion. 

A. 17. Approximation lemmas. We will frequently need to approximate a ran- 
dom variable or event in a complicated factor by linear, polynomial, or boolean 
combinations of random variables or events in simpler factors. To do this we shall 
use some very simple and standard tools, which we collect here for the reader's 
convenience. 

Recall that a random variable is simple if it only takes on finitely many values, or 
equivalently if it is the finite linear combination of indicator functions, or equiv- 
alently if it is measurable with respect to a finite factor. The following lemma is 
standard in measure theory: 

Lemma A. 18. Let B he a factor andp = 1, 2, 00. Then the simple random variables 
in U'{B) are dense in LP{B). 

Because of this, the task of approximating random variables quickly boils down to 
approximating events. Let us say that two events E,F are e-close if P{E\F) + 

P{F\E) < e. 

Lemma A. 19 (Approximation by finite complexity events). Let B = B[£] be a 
factor generated by a (possibly infinite) collection £ of events, and let e > 0. Then 
every event in B is e-close to a finite boolean combination of events from £ . Ln 
particular, if B is generated by an algebra Breg, then every event in B is e-close to 
an event from Sreg- If f & L^i^); then there exists a finite factor B' ofB generated 
by finitely many events in £, such that \\f — E(/|B')||i^i(B) < e. 

Proof Let B^^g be the algebra generated by £ (i.e. the space of finite boolean 
combinations of events from £). Let B^ denote the collection of events which is 
e-close to an clement of Brcg- Then one easily verifies that ne>o '^^ ^ factor that 
contains £, and thus contains B, and the first and second claims follow. To prove 
the final claim, first use Lemma A. 18 to reduce to the case where / is simple, and 
then use linearity to reduce to the case when / = I(-E') is an indicator function. By 
the previous claims, we can find an event E' G Sreg which is e/2-close to E, thus 
II/— 1(£') < e/2. This £" lies in some finite factor B' generated by £, and thus 
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on taking conditional expectations in B' we sec that ||E(/|6') — l{E')\\i^n^s) < e/2. 
The claim now follows from the triangle inequality. ■ 

Corollary A. 20 (Limits of chains). Let A he a totally ordered set, and for each 

a £ A let Ba be a factor of Bmax with the monotonicity property Ba C Bp whenever 
a<p. Let B := y ^^^Ba- Then for any f ^ L^{B), the net 'Ei{f\B a) converges to f 
in L"^ norm (thus for every e > there exists (3 & A such that ||/ — E(/|Ba)|| j,2(g) < 
e for all a > (3). 

Proof Let e > 0. Applying Lemma A. 19 with £ = [j^^j^Ba, we can find a finite 
factor B' generated by finitely many events in £ such that ||/ — E(/|S')||i2(g) < e. 
By monotonicity wc sec that B' is a factor of Ba for some a G A. The claim then 
follows from Pythagoras' theorem. ■ 

Corollary A. 21 (Approximation by finite factors). Let B\,. . . ,Bk be factors and 
£ > 0. Then every event in Bi V . . . V Bk is s-close to a finite boolean combination 

of events in Bi U . . . U Bk- Furthermore, given any random, variable f G L°°(i3i V 
. . . V Bk), there exists finite factors of Bi for i = 1, . . . ,k respectively such that 
||/-E(/|BiV...S^)|Ui(B,v...vB.)<^. 

Proof The first claim follows from Lemma A. 19 by setting £ := Bi U . . . U Bk- To 
verify the second claim, first use Lemma A. 19 to locate a finite factor B' generated 
by finitely many elements in BiLI. . .UBk such that ||/ — E(/|S')||ii(B^v...vBfc) < £/2. 
Now observe that B' is a factor of B[ V . . . V B'f. for some finite factors B'^ of Bi for 
i = 1, . . . ,k. The claim now follows from the same triangle inequality argument 
used to prove Lemma A. 19. ■ 



A. 22. Relative independence. Now we come to a fundamental notion for us, 

namely that of (relative) independence of two or more factors. 

Definition A. 23 (Independence). We say that two factors Bi,B2 are uncondition- 
ally independent if we have 

E(/i/2) = E(/i)E(/2) 

for all fi e L°°(£?i) and /2 G L°°{B2)- More generally, we say that two factors 
Bx,B2 are relatively independent conditioning on a third factor B with respect to 
the probability measure P if we have 

E(/i/2|S) = E(/i|B)E(/2|B) (15) 

almost surely for all f\ G L°°{B\) and /2 S L°°{B2)- In many cases, the probability 
measure P will be clear from context and wc shall omit the phrase "with respect 
to P". Given an at most countable collection of factors {Ba)aeA, we say that 
these factors are jointly unconditionally independent (resp. jointly relatively inde- 
pendent conditioning on a factor B) if \J c,eA ^^'^ VaeAg unconditionally 
independent (resp. relatively independent conditioning on B) for all disjoint sub- 
sets AijA^ of A. We say that a collection of events Ei,E2, . ■ . is unconditionally 
independent (resp. relatively independent conditioning on B) if their associated fac- 
tors B[Ei], B[E2], ■ . ■ are unconditionally independent (resp. relatively independent 
conditioning on B). 
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Examples A. 24. Two factors Bi,B2 are unconditionally independent if and only if 
P{EAF) = F{E)P{F) for all E G Bi and F € B2- In particular, two events E and 
F are unconditionally independent if and only if P{E A F) = P{E)P{F). Three 
factors Si, B2, B3 are jointly unconditionally independent if and only if P(i?i Ai?2 A 
E3) = P{Ei)P{E2)P{E3) for all E G Bi a,nd F e B2. On the other hand, in 
order for three events E, F, G to be jointly independent it is not quite enough that 
P{EAFAG) = P{E)P{F)P{G)- one also needs E, F, G to be pairwise independent, 
thus for instance V{E A F) = P(E)P{F). If 61,^2,63 are jointly unconditionally 
independent, then Bi V S3 and B2 V S3 are conditionally independent over S3, 
even though they are almost certainly not unconditionally independent. On the 
other hand. Si and S2 are both unconditionally independent, and conditionally 
indc>p(uident over S3. 

Example A. 25. Let xi,X2,X3 be three elements chosen uniformly and independently 
at random from {0, 1}. Then the events a;i = X3 and X2 = xz are unconditionally 
independent, but they are not relatively independent conditioning on the factor 
S[a;i = X2]- Thus we see that unconditional independence is neither stronger nor 
weaker than relative independence. 

Taking expectations in (15) we obtain 

E(/i/2) = E(E(/ilS)/2) = E(/iE(/2|S)) = E(E(/ilS)E(/2|S)) (16) 

whenever Si,S2 are relatively independent conditioning on S, /i G L°°{Bi) and 
/2 G L°-{B2). 

There are several equivalent formulations of relative independence. 
Lemma A. 26. Let Si,S2,S be factors. Then the following are equivalent: 

(i) Si and B2 are relatively independent conditioning on B. 

(ii) We have E(/i|S V S2) = E(/i|S) almost surely for all fi G L^{Bi). 

(iii) We have ||E(/i|S V 62)1^2 = ||E(/i|S)|U2 for all fi G ^^(Si). 

(iv) We have ||P(£;i|S V S2)||i,2 = ||P(£;i|S)||i2 for all Ei gSl 

Proof The equivalence of (ii) and (iii) follows from Lemma A. 12. The equiva- 
lence of (iii) and (iv) follows from Lemma A. 18, linearity, and a standard limiting 
arguments. 

To see that (ii) implies (i), observe for /i G L°°{Bi) and /2 G L°°{B2) that 
E(/i/2|S) = E(E(/i/2|SvS2)|S) 

= E(E(/i|SVS2)/2|S) 
= E(E(/i|S)/2|S) 
= E(/i|S)E(/2|S) 
where we have used the module property twice. 

Finally, we show that (i) implies (iv). We observe from (i) and the module property 

that 

E(/i/2/i|S) = E(/i|S)E(/2/i|S) 
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whenever /i e L°°{Bi), /2 £ L°^{B2), and h 6 L°°{B). Taking linear combinations 
and using limiting arguments we conclude that 

E(/i5|6) = E(/i|6)E(<?|S) 

whenever g e L°°{B V B2). We take expectations and obtain 

E(/i5)=E(E(/i|B)E(5|B)). 

Applying this with /i := I(-Ei) and 5 = P{Ei\B V S2) we obtain 

||P(^i|BvB2)||i. = I^{I{E^)P{E,\BVB2)) = E{P{E,\B)E{-p{E,\BvB2)m = \\P{E, 

and (iv) follows. ■ 

Now we can observe the following stability properties concerning relative indepen- 
dence. 

Proposition A. 27. Let Bi, B2 be two factors which are relatively independent con- 
ditioning on another factor B. 

(i) (Monotonicity) If B{ is a factor of Bi and B'2 is a factor of B2, then B[ 
and B'2 are relatively independent conditioning on B. 

(ii) (Absorption) Si V B and B2 V B are relatively independent conditioning on 
B. 

(iii) ( Gluing) Let B3 be a a-algebra which is relatively independent of Bi V B2 
conditioning on B. Then Bi is relatively independent ofB2\/B3 conditioning 
on B. 

(iv) (Factors do not affect relative independence) IfB[ is a factor of Bi and B'2 
is a factor of B2, then Bi and B2 are relatively independent conditioning 
on BVB'^VB'2. 

(v) (Independent information does not affect relative independence) Let B3 be 
a a-algebra which is independent of B W Bi V B2. Then Bi is relatively 
independent 0/B2 V B3 conditioning on B. 

Proof The claim (i) is trivial. To prove (ii), observe from symmetry and iteration 
that it suffices to show that Bi V B and B2 are relatively independent conditioning 
on B. But this follows from two applications of Lemma A. 26. 

To prove (iii), it suffices by Lemma A. 26 (and symmetry) to show that 

E(/i|B V Bi) = E(/i|B) 

for all h e L°°{B2 V B3). By density it suffices to show that 



E(/2/3|BvBi)=E(/2/3|B) 
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for all /2 G L°°{B2) and /a G L°°(i33). But this follows from the relative indepen- 
dence hypotheses and the module property: 

E(/2/3|B V Bi) = E(E(/2/3|B V Bi V B2)\B V Bi) 
= E(/2E(/3|SvBiVB2)|BvBi) 

= E(/2E(/3|B)|BVBi) 
= E(/2|BVBi)E(/3|B) 
= E(/2|B)E(/3|B) 

= E(/2/3|S). 



Now we prove (iv). By symmetry and iteration it will suffice to show that B\ and 
B2 are relatively independent conditioning on BX/S^. Prom Lemma A. 26 we already 
have 

||ii;(/2|B)|U2 = ||E(/2|BVBi)|U2 
for all /2 G L^(B2). Prom Lemma A. 12 we conclude 

||i?(/2|B)|U. = ||E(/2|BvBi)|U2 = ||E(/2|BvBi)|U= 
and the claim follows from another application of Lemma A. 26. 

Pinally, we prove (v). If B3 is independent of BVBi VB2, then by the monotonicity 
and factor properties (i), (iv) we conclude that B3 is relatively independent of 
Bi V B2 conditioning on B. The claim (v) then follows from the gluing property 
(iii). ■ 



Appendix B. Connection with recurrence theorems 



Wc have just seen how infinitary probabilistic statements such as Theorem 4.2 can 
imply finitary graph statements such as Lemma 4.1; later we shall see that one can 
also deduce finitary hypergraph statements in this manner. It is also well known (see 
[24], [5], [6], [22], [23], [12], [25], [31]) that these graph and hypergraph statements 
can in turn be used to deduce density results such as Szemeredi's theorem. This 
in turn is known by the Purstenberg correspondence principle to be equivalent 
to results siich as the Purstenberg recurrence theorem. Concatenating all these 
implications, one thus expects results such as Theorem 4.2 to be capable of implying 
results such as Theorem 2.2 directly, without the need to pass back and forth 
between the infinitary and finitary settings. 

Somewhat surprisingly, it appears to be somewhat difficult to achieve this goal; 

the best the author was able to do was simply to compose the various implications 
discussed above to obtain a connection. Por sake of completeness, we sketch a 
special case of this connection here, but it is puzzling that there seems to be little 
"synergy" between these two infinitary results, despite their similarity. As there 
appear to be no major new features emerging in this connection, we will skip over 
some of the details. 
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One can demonstrate the connection using the Furstenberg recurrence theorem 
(Theorem 2.2), but it will be slightly more convenient to instead work with the 
following variant: 

Theorem B.l (Furstenberg-Katznelson recurrence theorem, special case). [9] Let 
(n, Bmax, P) &e a probability space. Let S,T : fl ^ fl be two commuting probability- 
preserving bi-measurable maps. Then for all events A e Bmax with P{A) > 0, we 
have 

1 ^ 

liminf— — - V F(AaT"'AaS"A)>0. 

n=-N 

This theorem is equivalent to the assertion that any subset of with positive upper 
density contains infinitely many right-angled triangles {x, y),{x + r, y), {x,y + r), a 
result first obtained by Ajtai and Szemeredi [1]. In [25] it was observed that this 
theorem followed from the triangle removal lemma. Setting S := we obtain 
the fc = 3 case of Theorem 2.2. The full version of the Furstenberg-Katznelson 
recurrence theorem allows for an arbitrary number of commuting shifts, and can 
be treated by a modification of the arguments presented here. 

To transfer this theorem to a setting where Theorem 4.2 is applicable, we will 
have to perform essentially the entire machinery used in the graph correspondence 
principle. It is convenient not to work with graphs on N, but rather on tripartite 
graphs connecting three copies of Z: 

Definition B.2 (Tripartite graph universal space). A tripartite infinite graph is a 
sextuple G = (Z, Z, Z, E12, E23, E31) where £^12, E23, E31 are subsets of Z^. Let fl^ 
denote the space of all tripartite infinite graphs. On this space , we introduce the 
events Aij^kt.kj for ij = 12,23,31 and ki^kj S Z by Aij^ki,kj '■= [G G : {ki,kj) G 
Eij}, and let S^^x be the cr-algebra generated by the Aij^ki,kj- We also introduce 
the regular algebra generated by the Aij^ki,kj- For any three permutations 
ci, 0-2, 0-3 : Z — > Z we can define an action of (ui, (72, ca) on B^^^ by mapping 
Aij,ki.kj to ^y,<ji(fei).(Tj(fcj)- For any subsets /i,/2,/3 of N, we define Bf' to be the 
factor of S^ax generated by the events Aij^k^^kj where ij = 12,23,31, ki G Ij, and 
kj G Ij. 

Now we embed the system in Theorem B.l into this universal space. 

Definition B.3 (Tripartite graph universal embedding). Let (r2,Smax,P), S, T, 
A be as in Theorem B.l. Let A'' > 1 be a natural number. We introduce the 

probability space B^i^-F^^'^)-, defined as the space associated to sampling 

three infinite sequences {ni_ki)kiez for i — 1,2,3 uniformly and independently at 
random from [N] = {1, . . . , N}. Thus the product space {'^xfl^^\ Smax x B^aL , P x 
pWj represents the independent sampling of a point x from fl, together the three 
sequences n^^fc. G [N] for i = 1,2,3 and ki E Z. For any such x and n^^fc, we 
associate an infinite tripartite graph G = (Z, Z, Z, Ei2,E23, E31) in by setting 

E12 := {(fci,fc2) G Z X Z : T"!-*^!^"^'''^^ G A} 

E23 ■■= {(^2,^3) G Z X Z : T"3.*^3-"i,'=iS'"2,.2a; g A} 

E31 ■■= {{k3,ki) G Z X Z : T"i'*^iS'"^''=3-"2.'=2a; g A}. 
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This is a measurable map from n X to 11^ (the inverse image of ^i2,fei,fe2 is 

the measurable event T"i''=iS'"^'*^2a; g A, and similarly for the other two classes of 
generating events), and so we can push forward the measure P x p(^) to a measure 
on {^l X fi'^) X f2^, ;Smax X B^aL X B^^), which by abuse of notation we will also 
call P X P(^). 

A computation (using the probability-preserving and commuting nature of T and 

S) shows that 

PxPW(Ai2,0,0AA23 ,0,0 A ^31,0,o) 

= _i_ p (T"! 5"^ A A T"^ 5"= A A T"i 5"=* A) 

ni,n2,ra3e[^] 

= 773 P(yl Ar"3-"2-ni^y\5n3-n2-ni^) 

ni,n2,n3e[Af] 
1 ^ 

-nY. P(^Ar"y4A,S"y4). 

n=-2Ar 

Thus to prove Theorem B.l it will suffice to show that 

liminf P X p(^)(Ai2,o,o A ^23,0,0 A ^31,0,0) > 0. 

Siipposc this were false. Then one can find a sequence A^^"*) of N, going to infinity 
as m ^ 00, such that 

lim P X P(^""'H^i2,o,o A ^23,0,0 A ^31,0,0) = 0. 

m— >oo 

By applying Lemma A. 15, wc can pass to a subsequence if necessary, and obtain a 
limiting probability measure P'^ on {n^,B^^^) with the weak convergence property 

lim P X P(^*'"')(£) = P^{E) for all E e S^. 

The individual measures P x p(^) can be easily verified to be invariant under triple 
permutations (cri, (72, o'3), and so the limiting measure P^ is also. 

By adapting the arguments used to prove Lemma 3.5, one can exploit the above in- 
variance to establish the following relative independence property: If I^, . . . , 7^,; 
are subsets of Z with /j n Jj^i Ci. . .Cili^i for i = 1, 2, 3 , then the factors Bj^j^Js 
Vj=i 'S/i,j,/2,,,/3,3 relatively independent conditioning on V'=i Bi^r\h ^j^ryi2,i,hnh,j , 
with respect to this probability measure P'^. We omit the details of this as they 
are essentially the same as in the proof of Lemma 3.5 except for minor notational 
complications. 

We now apply Theorem 4.2 on {n^,B^^^,P^), with J := {1,2,3}, i^ax := {e : 
|e| < 2}, Ef. set equal to ^ij,o,o if e = {i,j} for some ij = 12, 23, 31, and E^ = O'^ 
otherwise, and with Be set equal to Bi^j,-^j.^, where li is equal to Z if i G e and Z\{0} 
if i ^ e. Thus for instance 'B{i,2} = Bx^x^x\{o}- The hypotheses of the theorem are 
easily verified, and by arguing as in the proof of Lemma 4.1 (or Lemma 4.4) we can 
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find regular events Flj e for ij = 12, 23, 31 obeying (11) and 

T"^{Aij\Fl^) < r]/lO for ij = 12,23,31, 

where t] := P{A) > is the probability of the original event A. In particular, for 
all sufficiently large m we have 

P X I>^^'"\Aij\Flj) < r]/10 for ij = 12,23,31. 

Now recall that the variables ni^o, ^^2,0, '^■3.0 a-re independently and uniformly dis- 
tributed on the interval [A^(™']. For any fixed ni,o,«2,Oi the probability that 713,0 
equals ni,o + '1-2,0 will equal approximately half the time, and the other 

half of the time. Since the event Ai^2\Fl 2 is independent of 713,0, we thus conclude 
from Bayes' formula that 

P X P"^'"' (Ai,2\^^i',2 1^3,0 =ni,o +712,0) < v/5 

for m sufficiently large. Similar arguments in fact give 

P X P^'"' (A,j\^^(^>3,o = ni,o + n2,o) < r?/5 for ij = 12, 23, 31. 
On the other hand, from (11) wc havc""^^ 

P X P^*"' {Fl^ A F^^3 A F^^ |7i3,o = ni,o + 712,0) = 0. 
Combining this with the preceding estimate we see that 

P X P^*"' (Ai,2 A ^2,3 A A3,i|7l3,0 = "1,0 + 7l2,o) < 37//5. 

However, the left-hand side equals 

piVt-) (p(yni,o5'n2.o^ ^ rn3.o-n2.o5'n2.o^ T"i-°5"''-°-"^''' A)|7l3,0 = "1,0 + "2,0) 

which simplifies (using the shift invariance) to P{A) = 77. Thus we have 77 < 377/5, 
a contradiction. This proves Theorem B.l. ■ 

Remark B.4. At present, the hypergraph regularity method is known to yield the 
Furstenbcrg-Katznclson recurrence theorem, but more powerful recurrence theo- 
rems, such as the Bergelson-Leibman polynomial recurrence theorem, the Furstenberg- 
Katznelson IP-Szemeredi theorem, and the Furstenberg-Katznelson density Hales- 
Jewett theorem, have not yet been successfully obtained by this method (either in 
the finitary or infinitary settings). It is not clear to the author whether this repre- 
sents any fundamental limitations to the method. A possible test problem would be 
the refinement of Szemcrcdi's theorem that the set of possible differences amongst 
the arithmetic progressions of a given length is syndetic (has bounded gaps); this 
was established for instance in [7] by ergodic methods but does not currently have 
a non-ergodic proof. 

-'^^Note how important it is here that the event ^1 2 ^ ^2 3 ^ ^3 1 ^'^ empty, rather than merely 
being a null event with respect to P^. In the latter case, the event would have a small but nonzero 
measure in P x p'^'"'' , and we would be unable to condition this event to the vanishingly small 
probability event 713,0 = "-1,0 + "2,0 without losing control on the conditional probability. The 
point is that the constraint na^o = rti.o + ^2,0 creates a "diagonal measure" which is singular with 
respect to P'^, and so null events in P^ do not necessarily restrict to null events on the diagonal 
measure. However, events which have empty intersection with respect to P^ will clearly continue 
to have empty intersection with respect to the diagonal measure. This robustness with respect 
to change of measure is what makes Theorem 4.2 (which is basically a mechanism for converting 
null events to empty events) so powerful. 
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